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PERSONALITY TRAITS OF SOCIALLY SUCCESSFUL 
AND SOCIALLY UNSUCCESSFUL CHILDREN* 


MERL E. BONNEY 
North Texas State Teachers College 


Psychologists have conducted many studies with the purpose 
of describing the kind of person who is generally well accepted 
socially as compared with the one who is socially unsuccessful. 
Popular psychological writers have been especially prolific in 
this area. They have offered many suggestions on how to be 
the life of the party or how to win friends. Since professional 
psychologists, as well as the general public, have shown con- 
siderable interest in the kind of personality which makes a re 
favorable impression on others, it seems that more research on lien 
inter-personal relationships is urgently needed. It is the purpose Fi 
of this article to make a contribution on this subject. 

Two methods of investigation were used in gathering the data 
for this study. One was trait ratings on the part of both teachers 
and pupils. The other was pupil choices of friends—a method 
which Moreno® refers to as a sociometric test. The subjects 
were fourth-grade children in three schools of Denton, Texas. 
One of the schools was the Demonstration School associated with 
the North Texas State Teachers College. The other two were 
public schools—the Sam Houston and the Robert E. Lee. 
Denton is a town of approximately twelve thousand population 
located in the agricultural region of north Texas. All data were 
gathered during the school year of 1941-1942. 





* The writer wishes to express his appreciation to the school officials and 
teachers of Denton, Texas, who have given their wholehearted codperation 
in this study. These include: Superintedent R. C. Patterson, Dr. J. C. 
Matthews, Mr. J. L. Yarbrough, Mr. J. D. Parnell, Mrs. Lulu Shoemaker, 
Mrs. N. R. Lukens, and Miss Ethel Miller. 
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The method of obtaining the data on trait ratings will be 
described first. The rating scale used was a slight modification 
of the scale developed by Caroline McCann Tryon in connection 
with the Growth Study of Adolescents of the University of 
California Institute of Child Welfare.'* Some changes were 
made in the wording in order that the descriptions of the traits 
would be more suitable for fourth-grade children. The scale is 
composed of twenty traits each of which is paired with its 
opposite. Also each is accompanied by a sub-statement which 
makes its meaning more clear. Below are three examples taken 
from the scale. Since all the other traits are listed in subsequent 
tables they need not be given at this point. 


Is like this: About Average Is like this: 

Daring: is ready to take Afraid: is often worried 
a chance at things or scared or won’t 
that are new or un- take a chance when 
usual, is never wor- something unex- 
ried or frightened. pected or unusual 

happens. 

Is like this: About Average Is like this: 

Leader: knows how to Follower: waits for 
start games or sug- : somebody else to 
gest something inter- think of something 
esting to do so others to do and always 
like to join in. likes to follow sug- 

gestions which others 
make. 

Is like this: About Average Is like this: 

Welcomed: someone Ignored: someone no- 
whom everybody body seems to care 
likes; others are glad much about; people 
to have him around. do not notice when 

he is around. 


The children had no difficulty in following this scale with the 
exception of those who were very poor readers. These had to be 
helped individually. 

All the children were given three copies of the scale and were 
told to rate three other pupils whom they regarded as friends in 
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their room at school. Since children vary greatly in popularity, 
some were rated by eight or nine other pupils, and in one case by 
eleven. Also, a few children were not rated at all, since no one 
choose them as friends. In order to get ratings on all children, 
the names of those who received no ratings, or only one or two, 
were written on copies of the scale, and these were distributed to 
the pupils in each room with the direction to make the ratings 
as best they could. Some attempt was made to have a child 
rated by another one whom the teacher considered to be his 
friend, but in many cases this was not possible. However, the 
fact that the two children were not personal friends probably 
should not be considered a serious matter in the validity of the 
ratings, since in almost every case both children had been 
together in the same grade for more than one semester and in 
most cases for more than three years. No child was included in 
the ratings who had been in the group less than six weeks. 
Furthermore, not more than twenty per cent of children in any 
grade were involved in the assigned ratings. 

After the children had ‘completed their ratings, each of the 
classroom teachers rated all their pupils on the same scale. The 
teachers did not know how the children had rated each other 
when they made their own ratings. 

After the data from the rating scales were tabulated, a system 
of scoring was utilized to arrive at a total score for each child. 
This system was as follows: 


Marked degree of given trait........................ 5 
Above average of given trait........................ 4 
Average amount of given trait...................... 3 
Below average of given trait........................ 2/ 
Marked absence of given trait...................... l 


These different degrees were defined as follows: 


1. Marked Degree—Three or more pupils agreed 100 per cent and 
the teacher agreed with the pupils. 

2. Above Average—Sixty per cent or more of the pupils (but not 
one hundred per cent) agreed and the teacher agreed with the pupils; 
three or more pupil raters agreed one hundred per cent but the teacher 
disagreed, or rated the child as average. 

3. Average—The pupils were divided (anything less than one hundred 
per cent agreement) and the teacher rated the child as average or dis- 
agreed with the predominant rating of the pupils. 
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4. Below Average—Same standard as for ‘above average’ except for 
opposite trait in each pair. 

5. Marked Absence—Same standard as for ‘marked degree’ except 
for opposite trait in each pair. 


In those instances in which the pupil raters were evenly 
divided, the scales were tipped in the direction of the teacher’s 
rating. 

Although the most typical number of child raters for each 
pupil was three, better than a third of the population in each 
grade had four or more pupil ratings upon which their composite 
score could be based. Approximately ten per cent had six or 
more ratings. With this many raters, together with the rating 
of the teacher, upon which to base the total score for each trait, 
it would seem that the measures obtained should be regarded 
as having a high degree of validity. 

Since the data on traits are to be related to different degrees of 
social acceptance and to different degrees of mutual attraction 
and rejection, it is necessary at this point to describe how these 
various forms of social relationships were measured. As previ- 
ously stated, the method used for this purpose was pupil choices. 
The situations used to obtain the choices are listed below. 


In the Demonstration School 


October—choosing ones preferred as working companions on 
committees which were to be appointed for the semester. 

December—listing names of those to whom they would like to 
give Christmas presents. 

February—listing names of those to whom valentines were to 
be given, in order that the teacher would know how many sheets 
of colored paper to give to each child for making the valentines. 

March—names of three friends were written on rating sheets 
which were then used to rate the friends on twenty personal 
traits. 

April—selecting companions for making an arithmetic chart, 
and also for working on a sign reading project. 

May—listing names of all best friends throughout the school 
year, as well as the names of all the best leaders in the room 


during the school year. 
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In the Sam Houston School 


November—listing names of all children who would be selected 
to remain in the room if all others had to leave. 

December—designating names of those to whom Christmas 
presents were to be given, as well as names of those to whom 
presents would be given if it were possible to do so. 

Ferbuary 10—voting on the king and queen fora valentine party. 

February 14—determining the number of valentines each child 
received. This was done by taking the valentines out of the box 
and counting the number for each child before they were dis- 
tributed to the children. 

March—same as for Demonstration School. 

May—same as for Demonstration School for both friends and 
leaders. 


In the Robert E. Lee School 


November—voting for officers in a class club. 

December—same as for November, with all officers available 
for re-election. 

February—counting of valentines as in Sam Houston School. 

March—same as in other two schools. 

April—election of club officers. 

May—voting on the ‘best citizen’ in the room for the school 
year. Considerable emphasis had been given to citizenship and 
five or six votings had been held throughout the year on the 
‘best citizen.’ Same as for other two schools for both friends 
and leaders. 

In nearly all of the above situations there was no limit placed 
on the number of choices which could be made. This technique 
provides a more adequate measure of each child’s social accept- 
ance than is possible when choices are limited to only one, two, 
or three names. All the choosing situations were conducted by 
the classroom teachers. 

In order to state both general social acceptance and mutual 
friendships in numerical terms the following system of scoring 
was used: first choice—5 points, second—4 points, third—3 
points, fourth—2 points, fifth—1 point, and all other choices— 
l point. (This point system was not used in the situation involv- 
ing the giving of valentines, since no order of choice was indicated. 
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Each valentine counted two points.) The composite social 
acceptance score for each child was determined by converting 
his raw score in each choosing situation into a per cent, adding 
all his per cent scores, and then obtaining an average for each 
child. For example, one child in Demonstration School received 
the following series of per cent scores in the eight choosing situa- 
tions throughout the school year: 4.5, 1.3, 6, 1, .8, 2.2, 4.9, and 
8.9. This makes a total of 29.6. When this number was 
divided by eight, an average of 3.7 was obtained. This was the 
child’s final social acceptance score for the year. All the other 
children had similar scores which ranged from .37 to 12. The 
total scores for the children in the other two schools were obtained 
in the same manner, and since all scores were turned to per cents, 
the results for the three schools could be thrown together and 
subdivided into quartiles for comparative purposes. 

Different degrees of mutual attraction and rejection were 
determined by a further differentiation of the data used in arriv- 
ing at general social acceptance. In the follow-up study which 
is being made of these subjects, five different degrees of attraction 
and rejection have been determined. These are: very mutual, 
moderately mutual, weakly mutual, largely unreciprocated, and 
very unreciprocated. Since only the two most extreme groups 
are used in this report, and since the other classifications have 
been described in a previous publication,” only the standards for 
the ‘very mutual,’ and the ‘very unreciprocated’ groups will be 
given here. 

The point of reference in determining the ‘very mutual’ 
friendships was the maximum score which one child would give 
to another if he voted for him in first place in every choosing 
situation during the year. Thus, in the Demonstration School 
there were eight choosing situations, which would make possible 
a maximum score of 40, since each first place vote counted five 
points. In the Sam Houston School the maximum score was 35, 
and in the Robert E. Lee School it was 40. The standard set up 
for a ‘very mutual friendship’ was that each child gave the other 
one a total score vote during the year equal to, or greater than, 
forty per cent of the maximum score. The ‘very unreciprocated’ 
group was made up of those combinations of pupils in which one 
gave the other one a total score vote equal to, or greater than, 
the standard for a very mutual friendship, and received in return 
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a total vote which was less than twenty per cent of maximum. 
These standards were set up after the data were gathered, since 
it was not known in advance how frequently the children would 
vote for each other. 


TABLE I.—AVERAGE ComposITE RATINGS IN TWENTY PERSONAL 
Traits REecErIveD BY CHILDREN IN THE HIGHEST AND 
LOWEST QUARTILES ON THE Basis or SoOcIAL 
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* Critical ratios were obtained from the differences between the means of 
the upper and lower groups divided by the standard error of these differences. 


The validity of these two groups, as representing extremes in 
friendship, can best be established by citing figures on the extent 
to which the children voted for each other. When the results for 
the three fourth grades were combined, it was found that the very 
mutual friends gave each other an average vote for the year of 
twenty-one points. Thus the average was considerably above 
the highest minimum standard set, i.e., forty per cent of forty, 
or sixteen. In the other extreme group, those whose friendship 
was ‘very unreciprocated’ gave an average of eighteen points to 
those to whom they were attracted, but received in return only an 
average of two. These data would seem to justify the assertion 
that the two groups were well differentiated in respect to mutual 
attraction on the one hand and rejection on the other. 

We shall now consider the data obtained from these various 
measurements. Table 1 shows the average composite scores on 
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the trait ratings received by the children in the upper and lower 
quartiles on the basis of social acceptance. In connection with 
these data attention will first be directed to a comparison of the 
upper and lower one fourths. Using the familiar formula 
od = ~/oM,? + oM,?, the statistical reliabilities of the differ- 
ences between the means of these two groups in all of the twenty 
traits were determined. Since standard error was used, a critical 
ratio of 3 shows complete statistical reliability. It will be recog- 
nized that the number of cases in the two groups is small, par- 
ticularly for generalizing to other similar populations. However, 
in addition to the reliabilities given above there is the fact that 
for most of the traits which have a critical ratio of 3 or more, the 
data showed a consistently smaller average in each of the four 
groups from the upper on down to the lowest. This consistency 
increases the validity of the trait differences between the popular 
and unpopular children. Another important factor which 
increases the validity of the findings of this study is that the 
results obtained from the study of general social acceptance, and 
from the study of the mutual and unreciprocated friendships, as 
well as the teacher ratings taken by themselves, all support each 
other in certain major respects which will be described later. 

Looking now at the critical ratios in the lowest column of 
Table 1, it will be observed that the upper.fourth in social 
acceptance is reliably superior to lower fourth in the following 
ten traits: tidy, leadership, friendly, welcomed, good-looking, 
enthusiastic, happy, frequent laughter, at ease with adults, and 
active in recitatons. 

What interpretation shall be given to these findings? In the 
first place it should not be overlooked that in exactly one half of 
the traits listed in Table 1, no reliable differences were found 
between the extreme groups in popularity. It is also significant 
that extreme groups had to be used in order to obtain any 
appreciable number of reliable differences. It would seem that 
there is not a very claws relationship between the degree of 





popularity in the group as a whole and possession of the traits 
measured in the rating scale used in this study. More will be 
made of this point later. 

It is worth while to note here the nature of the characteristics 
which were found to show a reliable difference between the two 
extreme groups. One fact which emerges is that the most 
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popular children are more aggressive and overt in their responses. 
This is shown by the traits of leadership, enthusiastic, frequent 
laughter, and being active in recitations. Apparently the highest 
social recognition does not generally go to children who are 
submissive, or docile, or who are characterized chiefly by negative 
virtues. This point is further emphasized by noticing the differ- 
ences between the upper and lower groups in other aggressive 
traits than those found to show a statistically reliable difference. 
It can be seen that in every case the averages for the high group 
in the traits of talkative, attention-getting, bossy, fights, daring, 
active in games, and grown up are all slightly higher than for the 
low group. In four of these traits—talkative, bossy, daring, and 
grown up—the differences approach statistical’ reliability, since 
the critical ratios are all 2. 

The finding that the most popular children are definitely more 
characterized by aggressive or socially overt behavior traits than 
are the least popular, is another score against certain traditional 
moral teachings which over-emphasize obedience and conformity. 
It seems that to be well accepted a child, as well as an adult, 
must possess many positive attributes which enable him to make 
himself count in a group. Under fairly typical life situations, 
such as a public school, it is safe to say that any individual is 
popular far more because of what he does, than because of what 
he refrains from doing. If he does various things which cause 
him to stand out from the group and win admiration, he has a 
much better chance of being well accepted, even though he has 
some obnoxious personal defects, than does the person who has no 
offending personal traits, but who is unable to make his person- 
ality register on the group. In other words, the data of this 
study support the thesis that popularity is more tied up with 
marked abilities and strong personality traits than with negative 
virtues. 

The above emphasis on positive and aggressive traits should 
not, however, obscure the importance of friendly attitudes, and 
other factors, in popularity. One of the traits listed previously 
as among those showing a completely reliable difference between 
the high and low group is ‘friendly.’ Also ‘tidy’ and ‘good- 
looking’ are on this list. The findings on these two points bear 
out the emphasis which is usually given to personal appearance 
by popular writers on social success. The fact that the upper 
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fourth in social acceptance was found to show a highly reliable 
difference over the lowest fourth in being ‘welcomed’ and ‘happy’ 
is certainly not surprising. Any other result would be difficult 
to explain. Also, it would probably be expected that the most 
socially capable children would show a distinct advantage in 
being ‘at ease with adults.’ A sense of social security on one age 
level no doubt contributes to a feeling of confidence with older 
age groups. 

Looking at the above results as a whole, it seems that the traits 
which proved most significant in differentiating between the 
popular and unpopular children may be organized into two 
syndromes. The first syndrome iscomposed of strong, aggressive 
personality traits such as leadership, enthusiasm, daring, and 
active participation in recitations. The second syndrome is not 
so definite but it is composed of traits which count the most in 
direct inter-personal contacts, such as a pleasing appearance, a 
cheerful disposition, and friendly attitudes. The traits most 
important in this second syndrome are: tidy, good-looking, fre- 


‘quent laughter, happy, friendly, and welcomed. 


The traits in the second syndrome are the ones which have been 
very generally emphasized in popular writings on friendship. 
Those in the first syndrome are the ones whose significance have 
usually been overlooked by popular writers. Their significance 
has also been frequently overlooked in the fields of moral and 
religious education where too great emphasis is often placed on 
conformity, nicety, and submission to authority. Of course, 
it may be that leaders in these fields are not greatly concerned 
over a child’s social success. No doubt there are points of view 
here which could be defended. To consider them would take 
our attention too far from the purpose of this report. It may be 
stated, however, that from the standpoint of the psychology of 
personal happiness, any kind of moral or religious education 
which is a handicap to the winning of friends, or to the attainment 
of group admiration, should certainly be subjected to critical 
examination. 

Let us now consider trait similarities and differences between 
‘very mutual’ and ‘very unreciprocated’ friendships. On the 
basis of the previously described standards there were twenty- 
two pairs of very mutual friends. After each child had been 
assigned his composite trait scores, it was necessary to decide 
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upon a standard of reference which would represent a high degree 
of similarity in a given trait between the two members of a pair. 

It was decided at first to count a pair as being very similar in 
a trait in which both members received either a 4 or 5 composite 
score. This would mean that they were above average or supe- 
rior in this trait. However, this method was discarded because 
in a few traits a large proportion of the children received above 
average scores, particularly scores of 4. This was due either to 
the leniency of the raters, or to the possibility that in a few traits, 
such as tidy, the majority of the children were actually high. 
In order to get a basis for similarity which would represent above 
average possession of a given trait, it was decided to take the 
average of each class group as the standard of reference. Some 
of these averages were as low as 2.0, 2.25, and 2.5, as for the 
traits of bossy, attention getting, and fighting. Some were as 
high as 4.1 and 4.2, as for welcomed and enthusiastic. Of course, 
the averages were not exactly the same in the three schools. 
For instance, the average for enthusiastic was in one school 
(as given above) 4.2; in another it was 4.0; and in the third it 
was 3.6. Consequently, the exact standard for a high degree of 
similarity between pairs of mutual friends varied some in the 
three schools, but, on the other hand, it was always on the same 
basis in that both members of a pair had to have a composite 
score which was above the average for their particular group. 

A further question was: How many pairs of the total of twenty- 
two should receive above average scores in a certain trait before it 
could be concluded that this trait was important in mutual 
attractions? An arbitrary decision was made in favor of sixty 
per cent. However, it was also decided that in those instances 
in which as many as fifty per cent of the pairs received scores 
above average, and had zero per cent below average, that 
this should also be taken as showing a significant degree of 
similarity. 

On the basis of these standards, it can be seen from Table II 
that there were ten traits which proved to have the closest 
association with the mutual friendships. These ten are: quiet, 
tidy, daring, leadership, friendly, welcomed, good-looking, 
enthusiastic, laughter, and active in recitations. 

It will be noted that most of these ten traits are the same ones 
which were found to show the greatest difference between the 
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upper and lower fourths in general social acceptance. The only 
differences are the traits of ‘happy’ and ‘at ease with adults’— 
each of which is on the list obtained from Table I, but not on the 
list from Table II, and the traits of ‘quiet’ and ‘daring’ which are 
on the second list but not on the first. 


TaBLeE II.—Perer Cent or TwWENTy-Two Pairs oF VERY MuTUAL 
FRIENDS WHO WERE ABOVE AND BELOW THEIR GROUP 
AVERAGES IN TWENTY PERSONAL TRAITS 


























i 
bo S/S 
£ 8) + gizis 
® Ele el.) |is/8| |s 
Name of traits aL, BO) E 3)2\% Ile al & 
AAR vol 2["5/3) 8/81] | 8] 9/2) 1 
~~) a = 
FEEDS HEEEUSEGEREEE 
. 5) - 
SElZ/SIBEIA/S\2/S/E||5/8)8)$/2/2/5/5 
om SF || ||| -|- |---| 
Per cent of Pairs 
Above Group 
Averages 60}18) 0) 13/60|36/50/54) 36; 18/60|91/64/50'41/50|45/73) 0/22 
Per cent of Pairs 
Below Group 
Averages 5 0/18/45 0/32) 0} 0} 9114) 0} O | o 0} 0)14 023)14 





















































The above differences do not alter the major groupings as 
described in the previous section on general social acceptance. 
The two syndromes of traits are again evident. There is, first, 
the group of traits indicating outstanding competence and 
strong, positive attributes. The traits in this group are: daring, 
leadership, enthusiasm, and active in recitations. The fact that 
daring was found to be among the traits showing the greatest 
similarity between mutual friends bears out the previous state- 
ments in respect to the importance of strong, positive, charac- 
teristics in personal attractions. Daring means very much the 
same as courageous, and it is difficult to imagine a courageous 
individual finding much emotional satisfaction in associations 
with a person who is characterized by timidity and fear. 

Similarity in leadership ability among the mutual friends 
should probably also be interpreted as meaning that it takes a 
capable person to satisfy the friendship needs of another capable 
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person, except in cases of individuals who have ability but who 
are weak in emotional maturity. Evidence from sociometric 
studies shows that weak individuals are also more attracted to 
strong individuals than to ‘other weak ones, but, since their 
attachment is seldom reciprocated, they suffer frustration of 
their emotional needs, or they do the best they can with others 
on their own level. 

Enthusiasm implies ‘pep,’ energy, and responsiveness. Accord- 
ing to the scale used a child with this trait “always seems to 
have a good time; seems to enjoy everything no matter where it 
is—in school, on the playground, at a party, everywhere.” It 
is not surprising that mutual friends should be significantly alike 
in such behavior. It is highly improbable that an enthusiastic 
child would get much satisfaction from a child who is generally 
listless, bored, or unresponsive. 

The data in Table II which show that seventy-three per cent 
of the pairs of mutual friends were characterized by active 
participation in recitations is probably to be explained on the 
basis of similarity in interests and intelligence, and also on the 
basis of the point made above; namely, that the capable, alert, 
and energetic children were attracted to others who had similar 
traits. It may be assumed that the attraction was due to the 
fact that complex personalities find their best satisfactions only 
in contacts with other complex personalities. This is by no 
means simply a matter of high IQ, but also includes social and 
emotional maturity. 

The second syndrome of traits found to characterize the mutual 
friends includes the following: quiet, tidy, friendly, welcomed, 
good-looking, and laughter. These are the ones most important 
in intimate, personal contacts. 

The trait of being ‘quiet’ probably should not be considered 
very crucial in determining mutual attractions. Its inclusion in 
the above list may be due to the fact that children who have 
mutual friends, and who are well accepted in their groups, feel 
secure and well adjusted and consequently are not restless. 

It is not surprising that the mutual friends were found to be 
quite similar in being tidy and good-looking. It is probably rare 
for two children (or adults) to establish a close relationship with 
each other if they differ very much in dress, personal grooming, 
and what is ‘generally called ‘good-looking.’ 
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The fact that most of the pairs of mutual friends received 
above average scores in being friendly, and that nearly all 
of them (ninety-one per cent) had high scores in being wel- 
comed in their groups, emphasizes a point previously made; 
namely, that the socially strong are attracted to the socially 
strong. 

The child who laughs frequently, who can tell and appreciate 
jokes, is characterized by a cheerful disposition. He buoys up 
the feelings of others and gives the impression that he is enjoying 
their company whether he is or not. Individuals with this kind 
of a disposition are much more apt to seek the company of each 
other than the company of those who have a depressed or sour 
disposition. 

There are some other points of interest in the data on mutual 
friends in Table II, other than the traits which showed the highest 
degree of similarity. It will be noted that only a small per- 
centage (eighteen) of the pairs was above the average of their 
groups in being talkative. However, not one of the pairs was 
below average in this respect. This shows that a moderate 
amount of talk was by far the most typical characteristic of the 
close friends. Notice should also be taken of the fact that not a 
single pair of the mutual friends was above the average of their 
groups in attention-demanding behavior, but also only eighteen 
per cent was below average. This shows again a moderate 
degree as the typical picture. 

Table II also shows that only a very small proportion (13 per 
cent) of the pairs was characterized as being bossy. In this trait 
forty-five per cent were below the average of their groups. This 
shows a rather heavy weighting against bossiness in inter- 
personal attraction. It is interesting to note that almost the 
same proportion of the,mutual friends was rated as being above 
average in fighting as was rated low in this trait. Apparently 
this means that fighting varies a great deal with the individual 
friendships. However, it should be stated that the group 
averages for fighting were low in all three schools. In each class 
the average composite score was less than 3. Consequently, 
even the children who were above average in this trait were not 
characterized by much fighting behavior. 

Activity in games does not show as high a degree of similarity 
as might be expected. This must mean that being active in 
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games is not very closely related to the kind of emotional satisfac- 
tions important in interpersonal attractions. 

In sense of humor the ratings of the mutual friends fall prima- 
rily in the middle range of their respective groups, rather than 
at either extreme. This may be due to the rather intangible 
nature of this trait which made it hard for the raters to be sure 
of their judgments, and, consequently, led to many average 
ratings. 

In the trait of happiness forty-one per cent of the pairs were 
rated above average and none below. This is not far from the 
standard set up for a significant similarity. No doubt this was 
a difficult trait to rate, especially for the children, who must 
have had rather vague ideas as to what is meant by happiness. 
Nevertheless, the results are in line with the major emphasis of 
findings on other traits, i.e., the very mutual friends are far more 
apt to be two happy individuals than two unhappy ones. 

Being at ease with adults is shown in Table II to have a rather 
uncertain relationship with the mutual friendships, since less than 
half of the pairs were above average and fourteen per cent were 
rated below. In the trait of being ‘grown up’, not a single pair 
was above average, while twenty-two per cent were below aver- 
age. This means that the great majority impressed others as 
acting according to their age level. This is certainly desirable 
as it shows a better social adjustment than either extreme. The 
same is true in respect to ‘older friends’. Most of the mutual 
pairs made friends with children of their own age. 

Let us now turn to a consideration of the trait similarities 
between those combinations of children in which one child 
showed a strong attachment to the other but the other one 
showed practically no interest in him. These combinations are 
referred to in this report as the ‘very unreciprocated’ friendships. 
This group of cases was included in this study in order to have a 
contrasting group to the very mutual friends. It was thought 
that results on trait similarities obtained from this second group 
would serve to validate or invalidate to some extent the results 
from the first group. 

Since both of the above groups were selected from the same 
population of children there was considerable duplication of 
names between the two groups. Nearly all this duplication was 
due to children on the ‘very mutual’ list being the ones sought 


ws ane eee 


ap gee 





> 


— ot fee’ oe 


an 


ais 


we sae Ge 


~ ’ - = & 2 
ot See a 3 ae 
° nae 7 > , 
ae — " .. meow debt 








% 
t 
ih 
Fi 
7 
: 








464 The Journal of Educational Psychology 


after on the other list. In a few cases a child from the ‘very 
mutual’ list was the source of two unreciprocations in the second 
group. Two of them were the source of three unreciprocations. 
Only one child from the ‘very mutual’ group was in the position 
of being unreciprocated in the second list. Of course, there 
were a good many pupils in the unreciprocated combinations 
who were not involved in the mutual friendships at all. 

In arriving at a basis for similarity between the pairs of unre- 
ciprocated friends, the same standard was used as was used for 
the mutual friendships, i.e., the per cent above and below 
respective group averages. The data bearing on the trait 
similarities in this group are given in Table III. 

In comparing the data of Table III with those of Table II 
relating to very mutual friendships, what are the most striking 
contrasts which appear? First, there is only one trait—that of 
tidy—on which fifty per cent or more of the pairs of ‘very unre- 
ciprocated’ friends received composite ratings which were above 
the averages of their respective groups. The fact that this one 
trait of tidiness or neatness was found so frequently in both 
extreme groups probably means that it is so common among 
school populations that it has but little differentiating value in 
determining friendships. 

It is significant that among the unreciprocated friends there 
was only one trait for which fifty per cent or more of the pairs 
received ratings above the averages of their groups, whereas 
among the very mutual friends there were six traits for which 
sixty per cent or more received such ratings, and ten for which 
fifty per cent or more received above average ratings. Such 
contrasting results is evidence that these traits must play an 
important réle in determining mutual attractions. This con- 
clusion is further borne out by the fact that among the unre- 
ciprocated friends there was, in every trait, a percentage of the 
pairs below average, whereas among the mutuals there were 
eleven traits in which not a single pair was rated below average. 
The much greater scatter of percentages of agreement among the 
unreciprocated friendships shows the lack of significant trait 
similarities. 

What practical implications arise from the above finding that 
mutual friendships are most likely to be formed between children 
who possess strong, aggressive personality traits, who are out- 
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standing in leadership and in class recitations, and who are at 
the same time friendly? As emphasized in a previous paragraph, 
one implication is that popularity and winning of friends are not 
the superficial things that they are often assumed to be; rather 
they are tied up with the most basic qualities of personality and 
character. Reading a book on how to win friends and influence 
people cannot possibly have the value which many people are 


TaBLeE III.—Per Centr or THIRTY-SEVEN Parrs OF VERY 
UNRECIPROCATED FRIENDS WHO WERE ABOVE AND BELOW 
TuHetr Group AVERAGES IN TWENTY PERSONAL TRAITS 
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led to believe it will have. To parents and teachers who are 
concerned with the social success of children, the admonition is 
to develop in them a wide range of abilities, and to teach them 
any skill whereby they can make acontribution to their respective 
groups. Also enough leeway in group control should be allowed 
to permit the development of some daring and initiative and some 
socially approved aggressiveness. But this is not enough. 
Ability alone is no guarantee of being liked. There must also 
be skill in the art of friendly intercourse. We know that some 
leaders among both children and adults have a scant number of 
mutual friends. A few have none. There is a warning in this 
point for those teachers and parents who assume that because a 
child has an outstanding ability which others admire, and may 
occasionally be elected to positions of leadership, that he is well 
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liked and is on the road to personal happiness. Although this 
assumption would generally be true, there are enough exceptions 
to warn against its uncritical acceptance. The attitudes and 
kinds of behavior essential to friendliness must also be developed. 

It is not within the scope of this article to enter into a discussion 
of ways and means to develop the art of friendly intercourse. 
It can be safely assumed, however, without extended discussion, 
that one of the most effective avenues to the attainment of this 
art lies in the achievement of many kinds of competence and the 
development of strong, positive personality traits. The person 
who senses that he possesses some kind of superiority in a group 
is certainly in a much better frame of mind to learn friendly 
attitudes and techniques than is one who lacks any kind of 
superiority or feels definitely inferior. It is also true that those 
who are friendly in their group contacts are the ones who are 
most likely to be helped by others to attain abilities and leader- 
ship positions. Thus there is a reciprocal relationship between 
the two syndromes of traits found above to be most significant in 
inter-personal attractions. 

One more topic remains to be considered; namely, the teacher 
ratings taken by themselves. It will be recalled that the teacher 
ratings were included in the composite scores which have been 
considered up to this point. Since.some would probably consider 
the teacher ratings more valid than those of the children, it was 
thought advisable to isolate the teacher ratings for separate con- 
sideration in order to see if these ratings supported the results 
obtained from the composite scores. Attention will now be 
directed to this point. 

First, what did the teacher ratings taken separately show in 
respect to the four social acceptance quartiles used in Table 1? 
An examination of the data showed that the children in the upper 
fourth were given markedly superior ratings over the lowest 
fourth in eleven traits out of the twenty. These traits were: 
talkative, leadership, friendly, welcomed, good-looking, enthu- 
siastic, happy, frequent laughter, at ease with adults, active in 
recitations, and grown-up. The basis for a rating being ‘mark- 
edly superior’ was that the number of children in the upper group 
who received high ratings in a given trait was at least twice as 
great as in the low group. (A high rating was a check at the 
extreme left of the scale.) As a matter of fact, there were seven 
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traits—namely, leadership, welcomed, good-looking, enthusiastic, 
at ease with adults, active im recitations, and grown-up—for 
which the number of high ratings in the upper group was three 
or four times as great as in the low group. Furthermore, no 
trait was considered for comparison between the two groups 
unless more than half of the children in the upper group received 
high ratings in this trait. Consequently, in each of the above 
mentioned eleven traits there were eleven or more children out 
of the twenty in the upper fourth who received high teacher 
ratings, and in addition the frequency of these high ratings was 
at least twice as great—and in most instances three or four times 
as great—as the frequencies in the lowest fourth. 

Another fact adding validity to the trait differences between 
the popular and unpopular children was that when all of them 
were arranged in quartiles on the basis of social acceptance (as in 
Table 1), the teacher ratings in all of the above eleven traits 
except one (friendly) were consistently lower in each succeeding 
quartile. This consistency increases confidence in the trait 
differences between the four groups. Furthermore, it should 
not be overlooked that these trait ratings were obtained from 
three different teachers who were working entirely independently 
of each other with their own groups of children. The fact that 
the ratings from three different sources held together as well as 
they did in respect to the different degrees of social acceptance 
certainly adds significance to the findings. 

It may now be asked how well the teacher ratings alone agreed 
with the results from the composite scores given in Table 1. 
The extent of agreement was very high. Of the nine traits found 
to show a reliable difference between the most and least popular 
children in Table 1, eight were also found on the list obtained 
from the teacher ratings alone. The only exception was the 
trait ‘tidy.’ This exception, however, does not mean any real 
disagreement between the teachers’ ratings and the composite 
scores. The basis for this statement is that all of the most 
popular children were rated by the teachers as tidy, but since 
fifteen of the least popular were also rated as tidy, the difference 
could not be considered significant. 

The uniformity of results obtained from the composite scores 
and the teacher ratings taken alone, adds emphasis to the previous 
discussion as to the importance, on the one hand, of strong, 
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positive traits in social acceptance, and, on the other hand, to 
the syndrome of traits related to intimate personal contacts. 
Both groups of characteristics are undoubtedly necessary to a 
high degree of popularity. 

It may now be asked what was the extent of agreement between 
the composite scores of the very mutual and the very unrecipro- 
cated friendships, on the one hand, and the teacher ratings taken 
alone, on the other. When the teachers’ ratings for the twenty 
pairs of mutual friends were matched, it was found that in only 
six traits could the degree of similarity between the pairs be con- 
sidered noteworthy. The basis for a noteworthy similarity was 
that in sixty per cent or more of the pairs the teachers gave a high 
positive rating to both members of the pairs. The six traits 
were: ‘quiet,’ ‘tidy,’ ‘avoids fights,’ ‘welcomed,’ ‘enthusiastic,’ 
and ‘happy.’ The three traits showing the highest degree of 
agreement between the pairs were ‘quiet,’ ‘tidy,’ and ‘avoids 
fights.’ The percentages were, respectively, eighty-two, sixty- 
four, and eighty. All teacher ratings were made without any 
knowledge of which children were mutual friends on the basis of 
the pupil choices. 

Before considering the implications of the teacher ratings on 
mutual friends, it will be best to state the findings on the teacher 
ratings of the unreciprocated friendships. In the latter group 
there were only three traits on which sixty per cent or more of the 
pairs received high ratings. These traits were: ‘quiet,’ ‘tidy,’ 
and ‘avoids fights.’ The percentages of pairs receiving high 
ratings on these traits were, respectively, 81, 73, and 73. These 
figures correspond very closely to the percentages of very mutual 
friends who received high ratings on these same traits. This 
result must be interpreted as meaning that, in so far as the teacher 
ratings are considered valid, these three traits do not play an 
important réle in determining friendships. As previously stated 
in discussing composite score differences between the mutual 
and unreciprocated friends, what is probably true is that such 
traits as the three above are so common among a typical public- 
school population that they are not critical factors in determining 
inter-personal attractions. It will be recalled, however, that the 
trait of being quiet was found to show a reliable difference in 
composite score between the upper and lower fourths in general 
social acceptance. 





Personality Traits of Children '* 469 


After eliminating the above three traits from consideration, 
only three remain as being significant in discriminating between 
very mutual and very unreciprocated friendships from the stand- 
point of teacher ratings alone. These are: ‘welcomed,’ ‘enthu- 
siastic,’ and ‘happy.’ The first two were also found among the 
most important ones when the composite scores were used. 

Looking back over the entire body of evidence from both 
composite scores and teacher ratings taken alone, it may be con- 


cluded that the following traits were found most important in ‘ 
discriminating between popular and unpopular children from | 
the standpoint of general social acceptance: ‘leadership,’ ‘enthu- | 
siasm,’ ‘active in recitations,’ ‘friendly,’ ‘welcomed,’ ‘good- . 
looking,’ ‘frequent laughter,’ ‘happy,’ and ‘at ease with adults.’ , 
These traits may be organized into the two syndromes previously’ 


described of strong, positive traits on the one hand, and of cheesy 
ful, friendly attitudes on the other. 

The traits found to be next in order of importance as attributes 
of popular as compared with unpopular children were: ‘daring,’ 
‘talkative,’ ‘tidy,’ and ‘grown up.’ 

The traits which proved to have least value in discrimirating 
between the popular and unpopular groups were: ‘quiet,’ ‘atten- 
tion-getting,’ ‘bossy,’ ‘fights,’ ‘active in games,’ ‘sense of humor,’ 
and ‘older friends.’ 

Reviewing all the foregoing evidence on mutual and unrecip- 
rocated friendships from the standpoint of composite scores and 
of teacher ratings taken separately, the following traits were 
found to be most important: ‘welcomed,’ ‘friendly,’ ‘enthusiastic,’ 
‘happy,’ ‘laughter,’ and ‘active in recitations.’ 

Next in importance were the traits of ‘daring’ and ‘goodlooking.’ 

In all the remaining traits involved in this study no significant 
differences were found between the very mutual and the very 
unreciprocated groups. 

It will be observed that more traits (twelve as compared to 
seven) were found to have little or no value in differentiating 
between mutual and unreciprocated friendships than were found 
to be significant in differentiating between most and least popular 
children on the basis of general social acceptance. This means 
that it is easier to describe traits important to general group 
acceptance than it is to isolate the traits which are essential in 
attracting one individual to another particular individual. It 
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may also be observed that in both the final summaries given above 
there are twice as many significant traits from the friendly syn- 
drome as from the other one. This could be interpreted to mean 
that the traits essential to direct inter-personal contacts—called 
the friendly syndrome—are more important than those in the 
other syndrome characterized chiefly by ‘leadership’ and ‘social 
aggressiveness.’ There are almost exactly the same number of 
traits in both categories in the total scale. 

It would be inappropriate to conclude this article without a 
brief evaluation of the trait approach to the study of personality. 
In recent years there has been much emphasis placed on the 
concept of the total personality as an integrated, functional 
unit which is more than the sum of its parts. We have been 
told that an individual is accepted by others not primarily 
because of certain traits, but because of the total impression his 
personality makes on others. Furthermore, it has been shown 
that the meaning of any trait cannot be determined from knowl- 
edge of that trait alone, but only from its position and inter- 
relationships in the personality as a whole. As a matter of fact, 
the evidence of this study can be interpreted as bearing out the 
above assertions, even though it was not gathered in such a way 
as to have direct bearing on these concepts. 

In the first place, it will be recalled that it was only when the 
upper and lower fourths in general social acceptance were com- 
pared that statistically reliable differences were obtained, and, 
even then, only eleven traits out of twenty showed a reliable 
difference. This must mean that a person working with one of 
the groups of children used in this study would not be quickly 
or easily impressed with the trait differences between pupils of 
high, medium, and low social acceptance. 

Furthermore, it will be recalled that although there was a high 
degree of similarity between the very mutual friends in some 
traits, there were, nevertheless, many exceptions even in these 
same traits. From the standpoint of teacher ratings taken alone, 
the agreement was still less. 

However, in spite of evidence bearing on what some call ‘the 
fallacy of the trait approach’ to the study of personality, it is 
necessary to study traits. Would it not be foolish to deny that 
some traits are more important than others in determining the 
social acceptance of the total personality? The evidence of the 





Personality Traits of Children 471 


present study shows that some traits are more important than 
others. After all, if psychologists are to make a contribution to 
child development, they must be able to do something more 
than repeat phrases about the importance of the whole person- 
ality. There must be analysis of parts as well as emphasis on 
the integrated whole. Efforts must be made to discover inter- 
relationships within the whole and, if possible, the syndromes or 
clusters of traits which are most essential for certain purposes as, 
for instance, social success or achievement in various occupations. 

Human personality is so complex that it must be studied from 
every angle, and with all possible methods if the greatest progress 
in understanding is to be made. 


CONCLUSIONS 


From the data of this study the following conclusions may be 
drawn: 

A child is well accepted in a group much more because of what 
he is and what he does which wins the admiration of others than 
because of what he refrains from doing, or, in other words,— 
strong, positive personality traits are more important than 
negative virtues. From this statement it follows that any type 
of moral or religious education which places great emphasis upon 
docility, nicety, and submission. to authority may be a handicap 
to a child’s social acceptance. 

Popularity is not the superficial thing it is often assumed to be, 
but is rather tied up with the most basic traits of personality 
and character. From this it follows that the winning of friends 
is not nearly as easy as popular writers would have their readers 
believe, but is, instead, the consequence of a good general 
development and preparation for all the problems of life. Strong, 
positive traits and friendly attitudes seem to be about equally 
important, but it is possible that the latter are more important 
than the former. 

The socially strong child is generally attracted to others who 
are likewise socially strong. 

Although it is no doubt true that liking and disliking people is 
not due primarily to particular traits, but is due to the impression 
which one total personality makes upon another total personality, 
it is still necessary to study traits in order to discover which 
kinds are most important for certain purposes. 
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COMPARATIVE TEST SCORES OF 
NEGRO AND WHITE SCHOOL CHILDREN 
IN RICHMOND, VA.* 


FRANK C. J. McGURK 
Children’s Memorial Clinic, Richmond, Va. 


INTRODUCTION 


In regions where there are large groups of Negroes and whites, 
each of which is relatively isolated socially, and particularly 
isolated in educational processes, the question of differences in 
intellectual ability, as far as can be measured by existing 
intelligence tests, becomes more than an academic problem for the 
clinical psychologist. 

The necessity of having some measuring device suited to the 
Negro was what prompted this work. In a survey of the 
psychological literature from 1932 to 1941, inclusive, the writer 
found that the majority of the investigators in the field of race 
differences agree that the measuring devices commonly used are 
not valid when used with the Negro; that there has been entirely 
too much bias; that selection of samples has not been scientific; 
that the differences found are largely differences in cultural 
background, or the result of speed differences; in short, the general 
feeling is that whatever differences appear are not differences in 
innate ability. 

This study is not concerned with the question whether the 
Negro is or is not as intelligent as the white; it is concerned with 
the scores which the two races make on three standard tests— 
the Chicago Non-Verbal Examination’, the Myers Mental Meas- 


* The writer wishes to thank those who aided him in making this study. 
He is especially grateful to the Richmond Community Council of Richmond, 
Virginia, for the appropriation with which the material for testing was 
bought; to the Works Progress Administration, (Official Project 65-1-31- 
2517), for assistance in scoring and tabulating; and to the Children’s 
Memorial Clinic for the space and time contributed. The kindness of the 
Richmond School Board in permitting the testing of these children is 
appreciated. 

The writer is indebted to Nadia Danielevsky for reviewing the statistics 
used in this work; to Prudence Kennedy Grantham and Mary Healy Shaw 
for supervising the W. P. A. workers and the many extra hours they spent 
tabulating and sorting; and to Paul S. Siegel, Davidson College, for reading 
the manuscript and for statistical assistance. 
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ure”!, and the Otis Self Administering Test of Mental Ability”. 
Form A of the Otis was used throughout, both Higher and Inter- 
mediate forms. From Grade IV to Grade VI, inclusive, the 
Intermediate was used; from Grade VII to Grade XI, inclusive, 
the Higher was used. The raw score of the Otis was translated 
into ‘T’ score according to the directions for scoring the test. 
Hereafter, these tests will be referred to as CNV, Myers, and 
Otis, respectively. 

In the spring term of 1940, when the testing was done, there 
were 6475 white males, 6950 white females, 2879 negro males, 
and 3710 negro females on the active rolls in the City of Rich- 
mond between Grades IV and XI inclusive. Grades I, II, and 
III were not included in the study. 

An alphabetical list was made of the pupils in each grade 
within a given school. Only the active pupils were included in 
the lists. Where there were two or more classes of the same 
grade in a school, they were combined into one list, so that, 
theoretically, for any given school only one class of a given grade 
existed for sampling. By selecting every tenth name from each 
grade list, separately for each school, a ten per cent proportional 
stratified sample was secured. Males and females were not 
separated in selecting the sample, but, since sampling was based 
on the school, Negroes and whites automatically were separated. 

In order to determine whether the sample chosen represented 
the school population, the Chi-square test of the significance of 
the difference between the number sampled by grade and the 
expected number at each grade (the basis for the latter being the 
number of children reported by the teachers as active on their 
rolls) was made. It was expected that the same percentage of 
children would be obtained by the sampling procedure for each 
grade as existed in the school population. 

Computing the percentage of children for each grade from the 
active rolls of the teachers, and applying these percentages to the 
number of children sampled in each race, estimates of the expected 
number for each grade were obtained. Calculating Chi-square 
between the expected and the actual sample, the sum of Chi- 
squares of 21.578 was obtained for the whites, and 8.616 was 
obtained for the Negroes. In each case there were fifteen degrees 
of freedom, and, entering the table of Chi-square with fifteen 
degrees of freedom, it was found that, for the whites, a Chi- 
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square of 21.578 could be expected by chance between ten and 
twenty times in one hundred, and that the Chi-square of 8.616 
could occur by chance between eighty and ninety times in one 
hundred. In neither case can the sampling distribution be said 
to differ significantly from the expected distribution. Table 1 
shows the number of children sampled. 


TABLE 1.—NUMBER OF CHILDREN SAMPLED BY AGE AND GRADE 
Wuite Wuitre Torat Necro Necro Toran 
Maues FemaLtes WuHites MAues FEMALES NEGRO 


Age 
8) 29 31 60 10 11 21 
10 72 69 141 22 29 51 
11 81 77 158 45 62 107 
12 78 72 150 59 60 119 
13 98 104 202 36 64 100 
14 99 93 192 60 62 122 
15 83 106 189 36 42 78 
16 80 76 156 23 40 63 
17 55 48 103 11 13 24 
Total* 675 676 1351 302 383 685 

Grade 

IV 69 62 131 53 54 107 
Vv 100 82 182 67 62 129 
VI 108 91 199 56 69 125 
VII 101 80 181 38 67 105 
VIII 94 103 197 42 53 95 
IX 81 103 184 29 33 62 
xX 78 86 164 16 30 46 
XI 87 100 187 7 30 37 


Total* 718 707 1425 308 398 706 


* Disparity in totals caused by exclusion from age groups of children whose 
CA was 18 or more, but who are included in grade totals. 


On the assumption of high correlation between age and grade, 
it was concluded that since the sample by grade was an adequate 
description of the school population by grade, the sample by age 
was also an adequate description of the age distribution of the 
school population. 
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All three tests were given at one sitting, with a short rest period 
included so as to reduce fatigue. The order of administration 
was varied, also, so that each of the three tests was given first, 
second, and third equally. All the children in any one school 
were tested at the same time, which resulted in groups of less 
than thirty-five children generally, with a grade range of three 
or four grades. This was not true of the senior high schools, 
where as many as one hundren children were tested at one time. 
The testing was done entirely by the writer and one assistant, so 
that neither tester tested one grade group completely. 

The reliability of the sample was tested by comparing the 
mean of one half of each of the groups with the mean of the other 
half. Each age group sample from nine to seventeen, and each 
grade group sample from IV to XI severally were split into two 
random halves, and the means calculated and tested for signifi- 
cance of the difference by the “T’ statistic. In no case was there 
a significantly reliable difference between the means of the halves 
of the samples. This was true for both the Negroes and whites. 
In all cases, the value of ‘T’ was below .01, the chosen level of 


significance. 
RACIAL DIFFERENCES 


Many of the investigators in the field of racial differences 
reported differences in mental ability (as measured by test 
scores) between the races which were in favor of the whites. 
Others writing critiques of the experiments, criticised the methods, 
the tests used, the sampling procedure, and the interpretations 
put on the results. There seems to be almost universal belief 
that differences in mental ability do exist; there is no universality 
of belief about the causes of the differences. 

Since this work is based on the Negro-white performance in 
Richmond, the data of other investigators have little direct 
bearing. Whatever attempts the schools in Richmond have 
made to investigate this difference are not reported. The ques- 
tion here is whether, in, the Richmond public schools, the per- 
formance of the Negroes differs from the white performance on 
the tests used, and the range selected. In this connection, the 
quantity of the difference, its direction, and its reliability are to 
be investigated. 
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The method of analysis of variance was used to investigate this 
difference. '°18 Referring to Table 2, it can be seen that the 
Negro scores are consistently below the whites. Suppose, how- 
ever, it be assumed that there is no difference between the average 
score of the Negro and the average score of the white; that is, 
that the noticed differences are chance differences. 


TABLE 2.—AVERAGE ScORES FOR THE CNV, Myers, Anp OrTI1s 
BY AGE AND GRADE 


AGE 9 10 11 12 18 14 15 16 17 Tora, 
Chicago Non- 
Verbal: 
White: 87 95 104 108 117 122 127 135 135 116 
Negro: 67 70 79 90 88 94 104100111 90 
Myers Mental 
Measure: 
White: 40 43 48 49 53 56 59 63 66 54 
Negro: 23 26 30 36 36 38 46 42 45 36 
Otis S-A Test: 
White: 39 42 46 47 51 54 58 62 63 52 
Negro: 34 33 36 40 40 42 46 46 51 41 


GravE IV V_ VI VII VIII IX X XI Tora. 
Chicago Non-Verbal: 

White: 86 98 108 114 121 130 134 141 117 

Negro: 66 78 87 96 98 108 116 116 90 
Myers Mental Measure: 

White: 38 44 50 51 57 60 61 £70 54 

Negro: 25 30 35 38 42 47 52 49 37 
Otis S-A Test: 

White: 37 42 48 50 54 60 62 67 53 

Negro: 32 35 38 42 44 49 652 58 41 


Following the usual procedure for the analysis of variance, the 
total variation between scores at nine different age levels (or 
eight different grade levels) obtained by the whites and Negroes 
are divided into two parts: variation between races and variation 
within races. When the two sets of sums of squares thus com- 
puted are divided by their respective degrees of freedom, two 
estimates of variance are obtained, under the null hypothesis, 
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from the same universe. If the classification by race does not 
divide the children into two significantly different groups, the 
ratio of the two variances should be as close to unity as it could 
occur under chance variations. This ratio is designated as a/b, 
where ‘a’ is the mean square variance between races, and ‘b’ is 
the mean square variance within races. 

The analysis can be carried one step farther. Since the scores 
are cross-classified by race and by age groups (or race and by 
grade groups), the variation between ages and within ages (or 
between grades and within grades) can be computed. Sub- 
tracting the variation between ages (or between grades) from 
within race variation, a new measure of variation is obtained not 
ascribable either to race or age (or grade) differences; thus the 
sum of squares attributable to chance is decreased. At the same 
time, the degrees of freedom necessary for the computation of the 
residual variance (called the ‘discrepance’) are reduced. The new 
ratio is now the former variance between races divided by the 
‘discrepance.’ In this way, the influence of grade, in groupings 
by grade, and the influence of age, in groupings by age, are 
eliminated. This ratio can be defined as a/d, where ‘a’, as before, 
is the mean square variance between races, and ‘d’ is the ‘dis- 
crepance’ as defined above. 

Chart 1 shows the level of significance (P) for each of the F’s 
obtained by the analysis of variance of the racial differences. In 
every case, the obtained F’s are high, and exceed the values 
required to satisfy the .001 level of significance when age or grade 
variation is removed. This means that once in a thousand times 
such unusual ratios would be observed under the null hypothesis. 
Therefore, the null hypothesis must be rejected in every case, 
and it can be concluded that the differences between the mean 
score of the whites is statistically different from the mean score 
of the Negroes on all three tests. 

The question with which this discussion started is now 
answered. There is a difference between the mean score made by 
the white and Negro enrolled in the Richmond Public Schools 
on each of the three tests used. The quantity of the differences 
can be estimated from Table 2. The whites make higher average 
scores than the Negroes on all tests, and at all ages and grades. 
According to the analysis of variance, the racial differentials are 
significant beyond the .001 level. 
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(F) OBTAINED IN THE ANALYSIS OF VARIANCE OF RACIAL 














DIFFERENCES 
Level of Significance 
Ratio of (P) 
Group Test Variance (F) 
— .05) .05 | .01 |.001 
White Males | CNV 11.4; 112.8 = iv 
vs. Myers | 24.6; 66.5 xV 
Negro Males | Otis 8.6; 45.9 x | Vv 
by Age 
White Males | CNV 3.5; 687.0 x ¥ 
vs. Myers 7.4; 50.3 x Y 
Negro Males | Otis 2.3; 104.7) x Vv 
by Grade 
White Females | CNV 10.5; 171.4 x |v 
vs. Myers | 14.6; 240.7 x |v 
Negro Females | Otis 8.2; 75.2 x v 
by Age 
White Females | CNV 5.6; 43.9 x Vv 
vs. Myers 7.1; 107.5 x Vv 
Negro Females | Otis 3.7; 58.9] x Y 
by Grade 
Total White | CNV 11.0; 197.4 x |v 
vs. Myers | 18.6; 303.4 xV 
Total Negro | Otis 8.4; 83.2 x Y 
by Age 
Total White | CNV 5.2; 512.6 x Vv 
vs. Myers 8.2; 120.9 x Vv 
Total Negro | Otis 3.3; 133.1] x Vv 
by Grade 
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Note: the first Ratio of Variance tabulated is the Ratio a/b, and is indi- 
cated under Level of Significance as ‘x.’ 


the Ratio a/d, and is indicated under Level of Significance as ‘V .’ 


The second Ratio of Variance is 
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CLINICAL IMPLICATIONS OF THE USE OF PSYCHOLOGICAL TESTS ON 
NEGRO GROUPS 


This study indicates that there is a large and statisticaily 
reliable difference between the average scores of whites and 
Negroes in the Richmond Public Schools on the tests used here. 
That such a difference could occur in other groups and with other 
tests is not beyond the realm of possibility, especially in view of 
the findings of other investigators. There is some indication that 
the racial differences tend to disappear in the northern states, 
where segregation is less stringent, and where each race has the 
same or approximate school equipment, teaching methods, etc. 
There is evidence also that the racial difference disappears with 
increased length of residence of the Negro group, so that in the 
North, the scores made by Negroes who had resided for long 
periods in, or who were native to, a given locality, are closer to 
the whites than the scores of Negroes who had resided a short 
time in that locality. 

Whatever the causes of the difference, it does exist in Rich- 
mond, and probably many other southern cities. The clinical 
psychologist, whose responsibility it is to diagnose the Negro 
children, and who may strongly influence the future of the child- 
ren diagnosed, should become familiar with the performance of 
the local group of Negroes, and adjust the tests to the average 
Negro performance. In certain sections of the country this 
may apply equally well to the white group. The usual method in 
Richmond was to add an arbitrary 10 points to the IQ made by 
the Negro child, and compare it with the white norms. The 
fallacies of this method do not need.emphasis. If a re-standard- 
ization of the tests used is required, and some such reliable 
method probably is necessary, it should be done. 

Using the published norms, a great many Negro children were 
scoring in either the borderline or feebleminded range. This 
was not true of the white children. An analysis of variance was 
done between the published norms, and the median scores cal- 
culated from Richmond data. The score corresponding to the 
50th percentile for each age as published by the authors was used 
for the CNV and Otis norms, but for the Myers the published 
norms are in terms of median ages, each with a corresponding 
score which is neither a mean nor a median score. 
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The results of the analysis of variance indicate that there is no 
difference between the published norms and the Richmond white 
data for the CNV and Otis, but the median scores of the Rich- 
mond white children were higher than the published scores. 
This difference was significant at the .001 level. Because the 
published norms are the scorés made by the median child of a 
given age, and the Richmond scores for the whites are calculated 
medians for each age level, the difference may not be interpreted 
as superior performance of white children in Richmond. Because 
of this, and the absence of a difference between the CNV and 
Otis published norms and calculated medians, it can be reason- 
ably accepted that whatever differences appeared were chance 
differences. : 

However, there are large and significant differences between 
the Negro medians and the published age norms. In every case, 
the differences were significant at the .001 level, and the Negro 
medians were much lower than the published norms. (See 
Table 3). 

It can be seen from this that none of the published norms is a 
reliable tool for the clinical psychologist who deals with groups 
like the Negro group in Richmond. The highest average score 
made by the Negroes in this group was approximately at the 
eleven- or twelve-year level on the published norms. 

At this writing, a set of percentile tables is being constructed 
for use with these tests for Negro populations like that of Rich- 
mond. At present, they are crude, but are being used in the 
Children’s Memorial Clinic in Richmond. These tables accept 
the average performance made on the tests used as the average for 
a given age or grade, and the variation is based on the actual 
variation as caleulated for each test. Such tables are being 
devised for both age groups and grade groups. 

The use of separate norms for the Negroes has several advan- 
tages. It accepts the Negro as he is, and describes the per- 
formance of the individual Negro in terms of the average 
performance as found for the Negro—not in terms of the average 
performance of the whites, minus an arbitrary number. Further, 
it permits the classification of a Negro child in relation to others 
of his race, culture, opportunities, etc.; in that what others of his 
circumstances do, so does he, or so does he not, and the magnitude 
of the ‘so does he’ or ‘so does he not’ is measured in terms of what 
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others like him are doing. Having grade percentiles allows the 
grade placement of a child among others with whom he can more 
easily compete. 


Taste II].—PusiisHep AGE Norms, WHITE MEDIANS, AND 
NeGRO MEDIANS FOR THE CNV, Myers, anv Ort!1s 





Myers Mental 


M Otis S-A Test 
easure 


Chicago Non-Verbal 





Pub- | Richmond | Pub- | Richmond Pub- | Richmond 
lished Medians lished Medians lished Medians 
Norms Norms Norms 

(2) | White| Negro} (21) | White| Negro} (22) | White} Negro 


Age 














9 90 89 | 64 29 38 27 36 38 34 
10 100 95 | 69 35 43 26 41 41 33 
11 110 104 | 78 39 46 30 46 46 35 
12 118 108 | 91 44 48 35 50 47 39 
13 122 116 | 86 48 53 35 53 51 38 
14 123 122 | 94 50 54 38 57 54 41 


15 49 | 57 | 44 60 | 58 | 44 
16 50 | 63 | 43 62 | 63 | 47 
17 up 63 | 63 | 51 
































Behind the use of a different set of norms for the Negroes is 
the philosophy that one is judged by the company one keeps. 
If the Negroes of the South are to be segregated, if they are to be 
deprived, if they are to be slovenly because of lack of opportunity, 
teaching, and ambition, and if their lives are to be lived in such 
totally different surroundings, with different chances than their 
white brothers, they then should be judged according to the 
standards which are common to their life. These standards 
probably will change as their lot in life becomes better, and as 
they absorb the white man’s culture. Eventually, there may be 
no reason for a separate measuring stick. 
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ON ESTIMATES OF TEST RELIABILITY 


LEE J. CRONBACH* 
State College of Washington 


Critical discussions of the reliability problem have been 
prominent in educational literature for twenty years. While 
many questions involved have by now been settled, interest 
has recently been intensified by renewed criticism of the custom- 
ary techniques. These attacks have been supplemented by 
numerous proposed substitutes, some of which are, in turn, 
receiving criticism. It now appears desirable to survey the 
questions arising regarding the split-half, Spearman-Brown, 
technique to determine whether that method must be abandoned. 


CRITICISMS OF THE SPLIT-HALF METHOD 


The earliest of attacks on the technique dealt with the validity 
of the formula itself. Empirical studies have, on the whole, 
demonstrated that the formula does predict the reliability 
obtainable on lengthening a test, and Dunlap has shown logically 
that when reliability is conceived as a property of the measuring 
instrument (eliminating from consideration day-to-day vari- 
ability of the subject) the split-half method is superior to the 
retest method.* 

Another more damaging line of attack has dealt with the 
undeniable fact that, since any test of 2n items may be split 


into two parts in sai different ways, it is not possible to con- 


sider the estimate obtained by a random or odd-even split as 
‘the’ test reliability.‘* Brownell has shown that the obtained 
coefficients may vary greatly, depending on whether the investi- 
gator obtains a ‘lucky’ split or not.? In the face of such an error, 
no great dependence may be placed in the estimates obtained. 
This may be particularly serious, Brownell has noted, when the 
split-halves are not reasonably comparable (as when the most 
difficult items fall within one of the halves). Dunlap,’ and 





* In preparing this paper, the writer has drawn largely upon experiences 
with the Evaluation Staff of the Fight-Year Study of the Progressive Educa- 
tion Association. The parallel-split method was used by that group in 
many situations over a period of years. 
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Jackson and Ferguson,* among others, have demonstrated 
mathematically that the split-half method gives erroneous esti- 
mates whenever the assumption that the halves are of equal 
difficulty, variability, and reliability is not met. The latter 
writers go so far as to say, of a test where the mean ‘odd’ score 
was lower than the mean ‘even’ score; “Obviously, one cannot 
use the split-half method in determining the reliability”’® p. 112*. 


THE KUDER-RICHARDSON METHOD 


To avoid the lack of uniqueness of the split-half method, and 
the familiar objections to the retest technique, Kuder and 
Richardson have proposed a ‘method of rational equivalence’. 
This method has been widely adopted, and is in fact in general 
use in Army test construction.'4 Blommers and Lindquist 
criticized this trend, pointing out that many users (including the 
present writer) have failed to demonstrate that a test satisfies 
the basic assumption of the Kuder-Richardson method—that the 
matrix of item intercorrelations is of rank one.'! This, equivalent 
to saying that the items measure only one general variable plus 
specific factors, is manifestly untrue for most achievement tests 
and omnibus types of ability test. Kelley has presented the- 
oretical objections to substitution of the Kuder-Richardson 
formulas for the customary reliability coefficient.'' He notes 
that the former essentially measure the homogeneity or ‘coher- 
ence’ of items; while this is related to accuracy of a measuring 
instrument, it is not a complete estimate of that accuracy. He 
does point out the usefulness of a statistic indicating homo- 
geneity, as a supplement to the reliability estimate. 





*In a statistical argument too lengthy to reproduce here, these same 
writers conclude that, because the absolute difference in half-scores (xy — 
x34) is not independent of the total score, an “element of ‘spuriousness’ will 
generally be present in the correlation of the scores on the odd and even 
items of a test and seems to be an inherent weakness of the method... . 
It is doubtful if the split-half method can be used with any degree of confi- 
dence in estimating reliability coefficients.” °*pp. 54-59. One may, of 
course, obtain peculiar results in any scatter diagram by considering only 
the sub-sample having very high or low total scores. This does not argue 
that the correlation for the entire sample is spurious, although x — y is 


' related tox +y. In thesplit-half problem, xy is independent of x34; so long 


as this obtains, the charge of spuriousness is unjustified. 
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The principal advantages claimed for the Kuder-Richardson 
method are ease of calculation, uniqueness of estimate obtained 
for a given set of responses, and conservatism. (Those who 
regard the parallel-form technique as ideal consider the estimates 
derived by the Spearman-Brown formula as spuriously high). 
While conservatism has advantages in research, in this case it 
leads to difficulties. It is helpful for a worker to know that a 
true statistic must be greater than a certain obtained value; it is, 
however, most confusing if the magnitude of the underestimate 
is unknown. Working with short tests, and preliminary forms 
of new instruments, the writer has repeatedly encountered this 
difficulty. On more than one occasion, the Kuder-Richardson 
estimate has been a sizeable negative value. This is of course 
meaningless, since complete heterogeneity would yield a coeffi- 
cient of .00. While split-half estimates may be negative, these 
can be considered sampling deviates from low positive values, 
probably due to poor splits. If measurements are independent, 
a negative reliability coefficient is inconceivable, since reliability 
is essentially self-correlation. The most confusing aspect of the 
Kuder-Richardson procedure is our lack of knowledge as to the 
magnitude of error. The authors, it is true, have found excellent 
correspondence to values obtained by the split-half method,'* 
but the test on which their comparison was based has a reliability 
near .90. Furthermore, to establish a test satisfying the assump- 
tions underlying their formula, the investigators rather artificially 
created tests having nearly uniform item difficulty, as may be 
noted from their Table I. Good empirical results for a well- 
standardized test were also shown by Froehlich.* For poorer 
tests, the two estimates may deviate markedly; Jackson and 
Ferguson®”-** present Kuder-Richardson coefficients which are 
quite different from split-half estimates based on the same data. 
When a split-half coefficient is .825, one is justified in thinking 
well of a short test; but the Kuder-Richardson estimate from the 
same data is the probably over-conservative value of .575. 
Several such disagreements occur, even though the assumption of 
unit rank seems likely to be satisfied in the tests under study. 
Examination of the Jackson-Ferguson data indicates that the 
magnitude of error bears no regular relation to the size of the 
coefficient. 
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Another drawback of excessively conservative estimates of 
reliability appears when one considers the intended uses of the 
results. These possible purposes include: (1) evaluation of the 
test, (2) testing the significance of differences between individuals, 
(3) testing the significance of gains, (4) correcting correlation 
coefficients for attenuation, and (5) estimating communality in 
factor analysis. As shown above, confusion may arise in use 
(1) from over-conservatism, but this type of parsimony need not 
be a source of error. In use (2), one may discard differences 
which are truly significant, if one over-estimates the error of 
measurement; it is now increasingly recognized that such a 
procedure is an error, which may retard research as greatly as 
failure to test significance. In use (3), the regression formula 
must be used; the appearance of the estimate of reliability in this 
formula is an additional opportunity for error. An extreme 
underestimate would cause gains of good students to appear 
unduly significant, and would underrate the significance of gains 
made by poorstudents. The attenuation formula becomes nearly 
worthless if reliability estimates are poor; overconservatism 
causes corrected correlation coefficients to reach unreasonably 
large values, often greater than 1.00.'* In the fifth instance, 
using a low estimate of reliability will eliminate over-factoring, 
but may cause one to overlook truly significant factors. 

These difficulties indicate that the Kuder-Richardson formula 
is not desirable as an all-purpose substitute for the usual tech- 
niques. This reopens the question whether a suitable estimate 
can be developed. Methods such as Jackson® proposes, based on 
analysis of variance, have much merit, but they require a retest, 
or use of a parallel form. Some procedure must be found which 
can estimate test reliability using the data from a single testing; 
this is needed, for instance, when developing a new type of test, or 
in many cases where one must analyse data from routine testing. 
As critics of the split-half technique have pointed out, however, 
the estimate must be relatively unique, so as not to be subject to 
chance factors. 


THE PARALLEL-SPLIT METHOD 


While some writers think of the split-half and odd-even tech- 
niques as synonymous, this is not so. The odd-even is but one of 
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the ota possible ways of dividing one test into two sets of n 
items. The present problem is to select one of the many esti- 
mates; such a value might be based on (1) a completely random 
split, (2) the median [or other parameter] of estimates from many 
such splits, (3) the highest estimate obtained among many splits, 
or (4) a controlled split designed to make halves comparable. 
The first of these does not work well; too often, the estimate is 
faulty because the halves obtained are not comparable, as the 
Spearman-Brown formula requires. Such a procedure is impar- 
tial, but not unique. The second proposal is defensible, but 
laborious. Furthermore, a median or mean is not a good approxi- 
mation of the ‘true’ reliability, since it is based upon both good 
and poor (non-comparable) splits.‘ The maximum value of the 
entire distribution of split-half coefficients is unique, for a given 
sample of responses, and may be suitable, especially as the split 
yielding the highest correlation ordinarily gives the most nearly 
comparable halves. The drawback in this procedure lies in the 
difficulty of finding the true maximum. The largest of several 
estimates from random splits should be a good approximation, 
but always one might by chance obtain several splits below the 
maximum. Such an obtained value is not unique, and requires 
an undesirable amount of labor. 

This leaves for consideration what may be called the ‘parallel- 
split’ technique, by analogy to the parallel-form method. In the 
latter method, one measures the subject twice, keeping the condi- 
tions of testing as similar as possible, and using two sets of items 
which have been deliberately made alike in form, content, 
difficulty, and range of difficulty. That is, the two tests are made 
as parallel as possible. In the single test, by a splitting two 
tests are formed. These can be made similar in form, content, 
difficulty, and range of difficulty, so that the test halves are com- 
parable. The resulting estimate is the most defensible reliability 
coefficient obtainable in a single testing. Its advantages are: 
(1) It is unique, within the range of variation involved in pairing 
test-items (see procedure below). This range is small, especially 
if the test is of considerable length. (2) The half-tests are com- 
parable, as required by the assumptions of the Spearman-Brown 
formula. Dunlap has shown that the -rrors involved in the use 
of that formula are reduced to a minimum when the two means, 
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sigmas, and reliabilities of the subtests become equal®. (3) When 
the test is so split that the halves measure the same behaviors— 
achieved by dividing on the basis of content—the halves are 
representative of the whole, and the coefficient is most meaning- 
ful. (4) The procedure requires little more work than the random 
or odd-even split. 

To obtain a parallel split, the investigator requires an item- 
analysis. This is made, using a representative, but small, sample 
of papers not used in the actual correlation. Using this analysis, 
pairs of items are selected which test the same behaviors or 
knowledge‘, and which are of roughly equal difficulty. ‘Testing 
the same behavior” means not only similarity of content, but 
similarity of response behavior. A completion item could not 
be paired with a recognition item. In a true-false test, the true 
items must be divided equally between the two halves, since it is 
known that responses to true and false items may be uncorre- 
lated‘. In other tests, other variables apart from explicit content 
may have to be considered. Naturally, some items will have no 
close mates, but by adjusting these one can balance the two 
halves of the test in difficulty and variability. Dunlap has pro- 
vided a simple test to show whether the resulting halves do 
measure the same mental function or complex of functions’. 
Each half is split into fourths, and the-tetrad of intercorrelations 
computed. If the tetrad ri2.rsq — ris Te is within sampling 
expectancy of zero, the halves are comparable. This refinement, 
together with an inspection as to whether M4 = M% and 
ol4 = 0%, substantially eliminates the judgment element from 
the procedure. For casual work, however, such a test probably 
can be dispensed with. 

One should of course consider the objections which may be 


‘ voiced to the parallel split. The estimate obtained is not over- 


conservative, but it should not be considered too high. Kuder 
and Richardson state that the random-split method is ‘“‘com- 
monly supposed to give estimates that are too high”’!*, p. 152. 
This thinking is based on the fact that a retest coefficient is often 
greater than the Spearman-Brown attempt to predict what the 
test reliability would be if the half-test were lengthened by 
adding a section ‘just like’ the first. This condition is satisfied 
only in the parallel split. Essentially, under the parallel-split 
one obtains a parallel-form reliability, without including the 
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lowering effect of changes in the subject from one testing to the 
next. Spurious factors must be avoided, to be sure. If the split 
scores are derived from the papers used in the item analysis, 
correlation of errors would artificially raise the estimate. In 
speed tests, no split method can be used. 

Some will contend that the elimination of quotidian variability 
is a spurious factor in any split technique. This is a confusion of 
validity with reliability; the reliability coefficient should be an 
index of the measuring technique, not of the subject. A mental 
test can only evaluate the behavior of the subject in a given 
stimulus situation. If carelessness, or situation set, or illness, 
affects his score, that score merely says that his behavior was 
higher or lower in quality during that testing than at some other 
time. The subject’s behavior is not likely to be constant from 
day to day, even if we are interested in some assumed constant 
trait (say, ‘intelligence’). The measuring stick may measure 
perfectly his behavior; the error lies in assuming that behavior is 
perfectly correlated with the hypothetical trait. This is, of 
course, a matter of validity. A coefficient reflecting function 
shift may be useful5, p. 452, but it does not indicate the precision 
of the measuring instrument. 

The deliberate placing of similar items in the two halves will 
be regarded as a spurious element by some. This is spurious only 
if items are so linked that one’s answer to one question pre- 
determines one’s response to the next'®. The problem is well 
illustrated in a diagnostic vocabulary test where thirty words 
from algebra, carefully selected from vocabulary lists for their 
importance, were tested. Six items in the test dealt with each 
word’. To split the test, one may place alternate groups of six 
items in each half, or may split within each group, placing three 
items in each half. The estimate from the latter procedure has 
been called spuriously high. However, the proper test is found 
in the formulation of the Spearman-Brown problem: What corre- 
lation would we have if we added another test just like the first? 
We might lengthen the test by covering thirty more words; if so, 
the former split is proper, as the halves are ‘just like’ in the 
same way as the two tests. But in this case we are testing knowl- 
edge of these thirty words, because they are important, and if the 
test were lengthened, it would be by adding six more items on 
each of these words. Therefore, the latter split is preferable, and 
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not spurious. Whether a given pairing of items introduces a 
spurious factor may be answered by the question: Would such a 
pairing be used in creating a parallel form of this test? Those 
who object to a planned split regard test construction as a matter 
of random sampling of items from among all the items dealing 
with the variable to be studied. This is not the case; nearly 
every test contains sub-variables: divisions of subject-matter on 
skills, particular behaviors in a personality test, or attitudes 
toward aspects of the central object in attitude tests. The divi- 
sion of items among these sub-variables is not done at random 
in modern test construction; usually, a deliberate plan is followed, 
a certain number of items being allotted to each sub-division. 
The parallel-split method employs a similar concept in obtaining 
reliability. 

The use of a planned, somewhat subjective, split is unorthodox, 
in comparison to the random or odd-even divisions. That this 
tradition is based on a fallacy is demonstrated by Kelley". 


A belief that two or more measures of a mental function exist is 
prerequisite to the concept reliability, and further, not only that they 
exist but that they are available before a measure of reliability is 
possible. . . . X; and X; must be judged a priori to be equally trust- 
worthy measures of this ability. . . . This act of a priori judgment is 
inherent and, though it can be voided so far as a combination of items 
is concerned by fractionating the measure (splitting the test), this only 
changes the size of the element upon which the judgment is made. . . . 

The usual split-half, or similar form, reliability coefficient is a precise 
measure of the extent to which differences in the X, scores are pre- 
dictable by a measure of this same degree of excellence, for X: is, 
according to judgment, such a measure. (Italics mine). The issue of 
‘correlation between errors’ is not involved. .. . 

We must not forget that an act of judgment has been demanded. 
This act is of the same sort as that of the test maker in putting together 
two or more exercises into a single test. . . . It should be a much less 
severe tax upon judgment to split a test with many items into com- 
parable halves than to draw up the items in the first instance so as to 
measure the same function. 


SUMMARY 


The usual split-half method for determining test reliability has 
been severely criticized because the chance element in splitting 
makes the reliability coefficient in error by an undetermined 
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amount. The Kuder-Richardson estimate, advanced as an alter- 
native, is unique, but is seriously overconservative, again by an 
undeterminable amount. Some better procedure for estimating 
reliability using data from a single testing is required. 

Writers have repeatedly pointed out that the Spearman-Brown 
approach is justified only if the two halves of the test are truly 
comparable. This paper has outlined a procedure for obtaining 
halves of guaranteed comparability. It requires that items be 
divided into two ‘parallel’ halves on the basis of item difficulty 
and content. The resulting value should be a close and rela- 
tively unique estimate of the accuracy with which the test meas- 
ures whatever its particular items measure in these subjects at 
this time. Any other coefficient, which includes the effect of 
changes in the subject from day to day, or in the test from form 
to form, is a measure of validity. 

This proposal is not a completely new idea. In article after 
article, such writers as Kelley, Holzinger, Brownell, and Dunlap 
have stressed the importance of using the split-half technique 
only with comparable halves. This paper has suggested explicitly 
that halves be made comparable, and has suggested the steps and 
cautions needed in such a procedure. 

This does not eliminate from consideration many other types 
of coefficient. The retest coefficient, the parallel form technique, 
the index of sensitivity, the Kuder-Richardson estimate of test 
homogeneity, and others have their own meaning and value. 
But for problems where an estimate of the self-consistency of a 
test is desired—this applies to all the uses of reliability coefficients 
listed above—the parallel-split appears most suitable. 

Further studies of several sorts are needed. The sampling 
distribution of the many split-half coefficients for a given test 
should be determined. The extent of error in the Kuder- 
Richardson formula, for tests of different types and lengths, 
requires study. The parallel-split method should be examined 
empirically, particularly to determine how greatly judgment in 
pairing items may affect the coefficient. Perhaps some worker, 
by analysing a set of data showing changes in pupils’ test scores 
during instruction, using different techniques for obtaining a 
reliability coefficient, will provide concrete evidence that obtain- 
ing the most meaningful estimate of reliability is of crucial impor- 
tance to the test user. 
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AN EVALUATION OF A TESTING PROGRAM 
IN EDUCATIONAL PSYCHOLOGY 


C. L. MORGAN 
Purdue University 
AND 
C. C. STEINMAN 
Williamsport, Indiana, High School 


INTRODUCTION 


A need has been felt for a testing program that would assist in 
predicting the success of students in educational psychology. 
The purpose of this investigation has been to evaluate a proposed 
testing program. Increased registration and resulting large 
classes make it extremely difficult for an instructor to recognize 
needs of individual students. If reasonable prediction were 
possible, based upon performance in certain tests administered at 
the beginning of the semester, it would be very useful in counseling 
students concerning their difficulties in the study before lack of 
time prohibited anything being done about them. 

Of course, such a problem is not new. May' found that 
academic success, indicated by ‘honor points’, could be predicted 
to the extent of Rii23).83; in which the criterion was (1) honor 
points; and the variables (2) general intelligence, and (3) number 
of study hours. It was found in this study that the R of .83 was 
not increased by the addition of another variable, (4) high-school 
marks, to the problem. 

Edds and McCall? found that it was possible to predict college 
marks, indicating success, to the extent of Roci23).81; in which the 
criterion was (0) college marks; and the variables were (1) mental 
ability, (2) English ability, and (3) high-school marks. 

Condit* found that it was possible to predict college scholarship 
of freshmen within one-third of a letter-grade by use of the 
Thurston Psychological examination administered at registra- 
tion time. It was also found that as good prediction could be 





1M. A. May, “Predicting Academic Success,” Journal of Educational 
Psychology, xtv (1923), 429-440. 
J. H. Edds and W. M. McCall, “Predicting the Scholastic Success of 
College Freshmen,” Journal of Educational Research, xxvu1 (1933), 127-130. 
*P. M. Condit, ‘‘ Prediction of Scholastic Success by Means of Classifica- 
tion Examinations,’’ Journal of Educational Research, xxx (1929), 331-335. 
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secured by the use of a reliable achievement test administered at 
the same time. ‘It was indicated that what the achievement test 
failed to measure psychologically it supplied in measurement of 
application. 

Adkins! conducted a study concerning the prediction of high- 
school scholarship success, based upon performance on certain 
intelligence tests. Of the tests considered, it was found that the 
Morgan Mental Test had the highest correlation with scholar- 
ship achievement; and the Kulhmann-Anderson test next. 
With these two combined, an R of .69 was obtained for prediction 
based upon eighth-grade work; and an R of .75 was obtained for 
prediction based upon ninth-grade work. 

Adams* conducted a study determining the possibility of 
predicting high-school and college success at the elementary- 
school level. The best results were obtained from a combination 
basis of mental ability, performance on the Stanford Achievement 
Test, and educational age. He found little justification for 
extensive prediction based upon the products of a minimum test- 
ing program during the elementary grades. 

Deputy? found that the Pintner-Cunningham Mental Test gives 
the best single means of predicting first-grade reading achieve- 
ment; but when weighted scores of a test of visual-visual associa- 
tion, a test of word selection, a test of visual-auditory association, 
and a test of content comprehension recall were added the predic- 
tive power increased appreciably. Sufficiently accurate predic- 
tion was obtained to be useful in work with beginning readers. 

Kriger‘, realizing the need of predictive measures in graduate 
education, conducted a study involving a population of candidates 
for Masters’ and Doctors’ degrees. Very few definite conclusions 
were drawn from this study; but the need of further research in 


this field was clearly indicated. 





1D. C. Adkins, “Efficiency of Certain Intelligence Tests in Predicting 
Scholarship Scores,” Journal of Educational Psychology, xxv111 (1937), 129- 


134. 
2F. J. Adams, ‘‘ Predicting High-school and College Records from Ele- 


mentary-school Data.” Journal of Educational Psychology, xxrx (1938), 


56-66. 
*E. C. Deputy, Predicting First-grade Reading Achievement, Bureau of 
Publications, Teachers College, Columbia Unviersity, 1930 

‘L. B. M. Krieger, Prediction of Success in Professional Courses for 
Teachers, Bureau of Publications, Teachers College, Columbia University, 


1930. 
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Drake! conducted a five-year study in an effort to arrive at an 
improved means of selecting college students. In this study hope 
was expressed for better measuring instruments of factors, other 
than intelligence, which condition and influence the achievement 
of college marks. 

All that is expected of this investigation is that it shed some 
light upon the possibilities of predicting the success of students in 
educational psychology, in order that instructors of this course 
may be in position to serve individual students’ needs best. 

Sophomore students in educational psychology served as sub- 
jects for this study. Of these students, complete data were 
obtained for one hundred-sixty. 


THE PREDICTIVE TESTS 


To establish ease and clarity when referring to tests, they have 
been grouped into two classes: predictive tests, and achievement 
tests. Of the predictive tests there are four: (1) Otis Test of 
Mental Ability’, (2) Iowa Silent Reading Test’, (3) English 
Vocabulary Tests for High-school and College Students‘, and 
(4) Educational Psychology Vocabulary Tests‘. 

The reliabilities of these tests were determined by the chance- 
half method. These reliabilities were corrected by the Spear- 
man-Brown Prophecy Formula. 


TaBLE I.—Data CONCERNING THE PREDICTIVE TESTS 
TEsT N M SD r or 
Otis Test of Mental Ability 160 175.35 17.95 .92 .01 
Iowa Silent Reading Test 160 149.54 27.93 .91 .01 
English Vocabulary Test 160 93.27 10.83 .95 .008 
Psychology Vocabulary Test 160 37.87 8.73 .85 .02 


Table I gives the number, mean score, standard deviation, 
coefficient of reliability, and standard error of r of the predictive 
tests. 


1C. A. Drake, A Study of an Interest Test and an Affectivity Test in Fore- 
casting Freshman Success in College, Bureau of Publications, Teachers Col- 
lege, Columbia University, 1931. 

2A. 8. Otis, Otis Group Intelligence Scale, World Book Company, 1919. 

*H. A. Greene, A. N. Jorgensen, and V. H. Kelley, Jowa Silent Reading 
Tests, World Book Company, 1927. 

‘W. T. Markham, English Vocabulary Tests for High-school and College 
Students, Public School Publishing Company, 1928. 

5C. L. Morgan, Educational Psychology Vocabulary Tests, Purdue Uni- 
versity. (Unpublished.) 
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The ideal test battery, of course, is a group of tests that have 
high correlations with the criterion and low with each other. It is 
interesting to note that psychology vocabulary has a low cor- 
relation with mental ability (.19) and silent reading (.37), but a 
reasonably high correlation with English vocabulary (.64). 

Inter-correlations were computed in order to determine the 
relationships of the predictive tests to one another. These inter- 
correlations are given in Table ITI. 


TaBLE II].—INTER-CORRELATIONS OF THE PREDICTIVE TESTS 
Iowa ENGLISH Psy- 

SILENT  VOCABU- CHOLOGY 

READING LARY VOCABU- 


LARY 
Otis Test of Mental Ability .70 (.04)* .51 (.06) .19 (.08) 
Iowa Silent Reading Test .60 (.06) .37 (.07) 
English Vocabulary Test . 64 (.05) 


* Standard error of r. 


MEASURING PSYCHOLOGY ACHIEVEMENT 


Five achievement tests were used in this study. Four of them 
were unit tests administered at the close of their respective units, 
and the fifth was the final test made up of items covering the 
entire course content. These tests were made up uf objective 
items including true-false, multiple-choice, and matching exer- 
cises. The reliabilities of these tests were obtained by the 
chance-half method and were corrected by the Spearman-Brown 
Prophecy Formula. Table III gives the number, mean scores, 


TaBLe IIJ].—Data CONCERNING THE ACHIEVEMENT TESTS 


TEsT N M SD r‘ or 
Psychology Achievement 
Test No. I | 160 47.67 10.17 .55 .05 
Psychology Achievement 
Test No. II 160 25.01 8.28 .65 .05 
Psychology Achievement 
Test No. III 160 46.28 8.61 .49 .06 
Psychology Achievement 
Test No. IV 160 44.98 9.15 .72 .04 
Psychology Achievement 
Test No. V 160 25.30 6.24 .23 .08 
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standard deviations, coefficients of reliability, and standard 
errors of r for these tests. 

It was necessary in some way to arrive at a single measure 
which would indicate a student’s success in educational psy- 
chology. This might have been accomplished by converting 
each raw score into its z-score or T-score equivalent and then 
combining each student’s set of z- or T-scores. This, obviously, 
would have necessitated a great deal of labor. The possibility 
of simply combining the raw scores of tests I, II, III, and IV 
(test V being omitted because of its low reliability) was con- 
sidered. Although such a procedure would slightly lower the 
reliability, and, therefore, the validity of the criterion, it was 
decided that this very small reduction would not be great enough 
to render the use of the raw score totals unsound. Remmers and 
Geiger! found an Rx«usc).719 when (X) grade-point averages 
were correlated with tests in (A) mathematics, (B) psychology, 
and (C) English. When raw-score totals of these tests were 
correlated with grade-point averages the r was found to be .688; 
a difference of .031. This difference is not great enough to 
prohibit the use of raw-score totals for predictive purposes. 
Success in educational psychology, then, was indicated by the 
combined raw scores of achievement tests I, II, III, and IV. 
Table IV gives the number, mean score, standard deviation, 
coefficient of reliability, and standard error of r of the Psychology 
Achievement Totals. 


TaBLE IV.—Data CONCERNING THE PsycHoLoGy ACHIEVEMENT 
TOTALS 
N M SD r or 
Psychology Achievement 
Totals 160 165.40 25.40 .82 .03 


We see from Table IV that the coefficient of reliability com- 
puted for the totals of the four achievement tests is .82. Upon 
comparing this coefficient with those of Table III the reliability 
for the total is greater than any of tests I, II, III, andIV. This, 
of course, is largely due to the increase in number of items. 


1H. H. Remmers and H. E. Geiger, ‘‘ Predicting Success and Failure of 
Engineering Students in the Schools of Engineering in Purdue University,” 
Studies in Higher Education xxxvi, Bulletin of Purdue University, 1940, 
Table VI, p. 17. 








a. 


— 


~~napes 


af 








i | 
Ft 


> ere og el 
a =e a 


a4 


——— 


— ei 


——- . SS ae Se > 
et ae a z . 
or Es ee a Fae a <4 = lo 7 

4 Ey 9 20 es 


es: ieee 


. oe. 


Ps Pe 
es 
nee 


Se pT Se 


Str oy oo Me Bee * eee 
Maes pes dig FOL, cate 


ee a i'd 
ee es a Loe See 


Pez 


mY 
= 


ee F. 


— a ee 


E ee A. . “~ -4 . a se . oe . Beapeny egw aaa 
a SE 5 See pee ne = Ae Bat rae appre ate oops stng 
= AOR ET RS FRR, TF Mt PEST Re TR ora RES 


et ee ee 


TR ne PRS ae ne 


ee oT, 


—z spe er 


Ct or 


— serra 


a 


4 


Y 


500 The Journal of Educational Psychology 


In order to study the relationships of the four achievement 
tests, comprising the total achievement score to one another, 
intercorrelations were computed. Table V gives the results of 
these computations. 

TABLE V.—INTER-CORRELATIONS OF THE SEPARATE 


ACHIEVEMENT TESTS 
Test No. Test No. Test No. 


II III IV 
Achievement Test No. I .09 (.08)* .09(.08) .36 (.07) 
Achievement Test No. II .32 (.06) .44 (.06) 
Achievement Test No. III .50 (.06) 
* Standard error of r. 


These tests were designed to measure achievement on the fol- 
lowing units of work: Test I—Mental Hygiene, Test II—Sta- 
tistics, Test I]I—Tests and Measurements, and Test IV— 
Problems and Principles of Learning. 

It is evident that there is no high degree of correlation between 
the units of work making up the course in educational psychology. 
The highest correlation coefficient (.50) is found between Tests 
and Measurements and Principles of Learning. 


THE USE OF ONE VARIABLE IN THE PREDICTION OF ACHIEVEMENT 


Coefficients of correlation were computed for each predictive 
variable with Psychology Achievement. Table VI gives the 


TABLE VI.—CORRELATIONS OF THE SINGLE VARIABLES 
witH PsycHOLOGy ACHIEVEMENT 


PsYCHOLOGY 

ACHIEVEMENT 
Otis Test of Mental Ability .54 (.06)* 
Iowa Silent Reading Test : 52 (.06) 
English Vocabulary Test .58 (.05) 
Psychology Vocabulary Test .63 (.05) . 


* Standard error of r. 
coefficients of correlation of the four single predictive variables 


and Psychology Achievement. 
Fisher’s method of determining the significance of differences _ 
in r’s was used.' It was found that there is no statistically signif- 


icant differences of the r’s of Table VI. 





1E. F. Lindquist, Statistical Analysis in Educational Research, Houghton 
Mifflin Company, 1940, pp. 210-218. 
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THE USE OF TWO VARIABLES IN THE PREDICTION OF ACHIEVEMENT 


Table VII gives the multiple correlation coefficients for the 
six combinations of predictive variables with psychology achieve- 
ment together with the standard errors of R. It was found by 
the Fisher z-technique that the R .76 of Otis Test of Mental 
Ability—Psychology Vocabulary with Psychology Achievement 
(Table VII)—is significantly higher at the five-per-cent level than 
the r .63 of Psychology Vocabulary with Psychology Achievement 
(Table VI). ; 


TaBLE VII.—MUvULTIPLE CORRELATIONS OF COMBINATIONS OF 
Two VARIABLES WITH PsycHOLOGY ACHIEVEMENT 


PsyYCHOLOGY 
ACHIEVEMENT 
Otis Test of Mental Ability 
Iowa Silent Reading Test .57 (.05)* 
Otis Test of Mental Ability 
English Vocabulary Test .64 (.05) 
Otis Test of Mental Ability 
Psychology Vocabulary Test .76 (.03) 
Iowa Silent Reading Test 
English Vocabulary Test .62 (.05) 
Iowa Silent Reading Test 
Psychology Vocabulary Test .70 (.04) 
English Vocabulary Test 
Psychology Vocabulary Test .73 (.04) 
* Standard error of R. 


We see from Table VII that the Otis Test of Mental Ability— 
Psychology Vocabulary Test—is the best combination of two for 
the prediction of Psychology Achievement. 


THE USE OF THREE VARIABLES IN THE PREDICTION OF 
PSYCHOLOGY ACHIEVEMENT 


The same procedure was used to find the predictive value of 
three variables as was employed in finding the predictive value of 
two. Table VIII gives multiple correlation coefficients for the 
four combinations. 

It is shown by Table VIII that the addition of another variable 
does not make for greater accuracy in the prediction of success in 
educational psychology. Higher multiple correlation coefficients 
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TaBLE VIII.—Muv.ttiete CoRRELATION COEFFICIENTS OF 
CoMBINATIONS OF THREE VARIABLES WITH PSYCHOLOGY 


ACHIEVEMENT 

PsYCHOLOGY 

ACHIEVEMENT 
Otis Test of Mental Ability 
Iowa Silent Reading Test .64 (.05)* 
English Vocabulary Test 
Otis Test of Mental Ability 
Iowa Silent Reading Test .76 (.03) 
Psychology Vocabulary Test 
Otis Test of Mental Ability 
English Vocabulary Test .76 (.03) 
Psychology Vocabulary Test 
Iowa Silent Reading Test 
English Vocabulary Test .70 (.04) 
Psychology Vocabulary Test 


* Standard error of R. 


were not secured by the addition of silent reading and English 
vocabulary to the combination of mental ability and psychology 
vocabulary. 

A multiple correlation coefficient, involving all four variables, 
was computed. The correlation was .76 with a standard error of 
R of .03. 

The use of all of the four variables as a basis of predicting 
Psychology Achievement is not better than the use of only two; 
namely, Mental Ability and Psychology Vocabulary (Table VII). 


SUMMARY AND CONCLUSION 


Variables in addition to Mental Ability and Psychology 
Vocabulary had no effect upon the accuracy with which success 
in educational psychology could be predicted. The coefficient of 
reliability of the criterion, Psychology Achievement, was .82 
(Table IV). The best multiple correlation coefficient was .76, a 
difference of .06. It is thus apparent that little of the variance 
in the criterion measure is unaccounted for by the two best 
predictive measures. 

It is concluded that, of the tests used, the basis of Mental 
Ability, as measured by the Otis Test of Mental Ability, and 
Psychology Vocabulary are the most practical means of predicting 
success in educational psychology. 





SOME NOTES ON THE USE 
OF OPTIONAL TEST ITEMS 


ROBERT F. BARRY 
Specialist in Testing, Board of Education, Rochester, N. Y. 


A matter of some importance for those interested in achieve- 
ment tests arises from the use of optional questions. Students in 
colleges or in the upper grades of high schools rather commonly 
encounter test items which permit them to answer say any four 
out of six, or any five out of seven, or other similar options. But 
for the pupils in the lower high-school grades, the optional idea 
may present some difficulty. Their lack of maturity and experi- 
ence is apt to make unusual directions difficult to comprehend, 
especially under the mental stress of an examination. And 
yet, occasionally this kind of item appears on teacher-made 
examinations. 

This paper attempts to discover what happens with ninth-grade 
pupils (1) in regard to their adherence to directions and (2) in 
regard to the teacher’s treatment of the pupils’ responses when 
optional type directions are not followed accurately. For the 
purpose of this study, two thousand forty-one examination papers 
in ninth-grade social studies were available. Because of the 
scarcity of clerical help, every tenth paper was withdrawn for 
tabulation, making a total of two hundred four. 

The examination contained four groups of optional questions 
headed by the directions presented below: 


1) Each of the following (11) places is located on the map by a 
number. Choose any nine. Before each of the nine places you choose, 
write its number from the map. 


Fourteen places appeared on the map. This group was at the 
very beginning of the examination. 


2) At the right of five of the following put the number of the most 
nearly correct answer. 


Six multiple-choice questions then followed. This group was 
about one third of the way through the test, and followed twenty- 
one similar items without options. 


3) Each of the following (15) places is located on the map by a 
number. Choose any eleven. Before each of the eleven places you 
choose, write its number from the map. 

503 








be 
; 
i 
j 


a as a ae 











504 The Journal of Educational Psychology 


Twenty-one numbers appeared on the map. This group was in 
the middle of the examination. 


4) At the right of five of these statements, mark T if true or F if false. 


Six true-false items then followed. This group came about 
two-thirds of the way through the examination, immediately 
following twelve items of the ordinary type without options. 

In order to establish an unbiased background, neither pupils nor 
teachers were given in advance any instruction as to how to answer 
or how to rate optional questions. Since there is no problem in 
regard to the pupils who followed directions literally, this paper 
will consider only those who answered more than the required 
number. To what extent do ninth-grade pupils follow these 
optional directions and hence limit the number of their responses? 
In case they answer more items than are required, how do teachers 
mark their responses? 

The two hundred four papers were divided on the basis of total 
examination scores into five categories, commonly called A, B, C, D, 
and E, the proportions in each category being ten per cent, twenty 
per cent, forty per cent, twenty per cent, and ten per cent, respec- 
tively, from high to low. The purpose of this division was to 
obtain some idea of the relationship between achievement on the 
examination and each of the questions of the preceding paragraph. 
' Table I shows the percentage of pupils who followed directions, 
hence limiting their answers to the number of responses required. 
This table is divided not only into the five categories of achieve- 
ment, but also into the four optional groups of items. It is 
interesting to note that the pupils were much more successful in 
handling the first and third groups, both of which contained 
maps and, hence, could not be answered by reading alone, than 
they were with the other two groups. These last two—groups 
two and four—where only about half the pupils followed direc- 
tions, were ones which required only reading, and consequently 
may have been answered rather hurriedly. In addition both 
of these latter groups were preceded by a considerable number of 
reading-type items involving no options. Hence, it may be that 
the pupils had been lulled into a false sense of security by having 
just encountered so many ordinary items. 

Table II shows for each category, and for each group of items, 
the success attained by those who attempted to answer more than 
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the directions called for. This answers the question as to how 
successful were those who exceeded the requirements. The 
results show that only nineteen per cent were successful in their 
extra efforts. 


TABLE I.—PERCENTAGES OF Puprts WHO FOLLOWED OPTIONAL 
DrrRecTIons FoR Eacu Group or ITEMS AND BY THE 
CATEGORIES OF ACHIEVEMENT 


1 2 3 4 5 6 7 





Groupe of items | 244|39B| 75C|43D|23E| 7 
ps Pupils| Pupils| Pupils| Pupils| Pupils|®"°"P* 





First group of items| 83 79 78 67 87 78 


(map) 
Second group of 
items 63 70 51 40 30 51 
Third group of items} 87 92 75 70 73 78 
(map) 
Fourth group of 
items 67 59 53 35 39 50 





Average percentages| 75 75 64 53 57 























Percentage of total who followed directions............... 64 





The teachers themselves are also involved in this matter of 
optional directions because of the problem of how to rate those 
responses which exceed the requirements. Readers should 
remember that no special instructions were given to the teachers 
on this point. This permits investigation of their methods of 
rating when the teachers are not bound by instructions and, 
hence, have a free choice of procedure. 

Table III shows the two principal methods of rating employed 
by the teachers. Notice that when the teachers marked rigidly, 
that is by counting only the first five or nine or eleven responses 
according to the credits specified, they were more lenient with the 
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TABLE II.— PERCENTAGES OF Success ATTAINED BY THOSE WHO 
ATTEMPTED TO ANSWER More ITEMS THAN THE DIRECTIONS 
CALLED FOR 

1 2 3 4 5 6 7 
Totals 
by 


groups 
of items 





Groups of items A’s B’s C’s | D’s| E’s 





First Group 
(2 extra) 
Number succeeding 4 2 2 0 0 8 
Number attempting 4 8 16 | 14 3 45 
Per cent successful 100 25 13 0} 





Second Group 
(1 extra) 
Number succeeding 1 0 0 0 0 1 
Number attempting 9 12 37 | 26 | 16 100 
Per cent successful 11 0 0 0 0} 1 





Third Group 
(4 extra) 
Number succeeding 3 3 9 4 1 20 
Number attempting 3 3 19 | 13 6 44 
Per cent successful 100) 100) 47; 31 17 45 





Fourth group 
(1 extra) 
Number succeeding 6 5 8 4 2 25 
Number attempting 8 16 35 | 28 | 14 101 
Per cent successful 75 31 23} 13) 13 25 





=.%, 


Totals— 
Number succeeding 14 10 19 8 3 
Number attempting 24 39 107 | 81 | 39 
Per cent successful 58 26 18; 10 8 























Total success for all who tried any extra answers: 
Number succeeding 54 
Number attempting 290 
Per cent successful . 19 
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poorer pupils than they were with the better ones. Slightly more 
than half the teachers used this method. On the other hand, 
another large group of papers were marked by giving credit for 
any responses that were correct up to the number of credits 
allowed. When this method was used, again the teachers were 
more lenient with the poorer pupils than with the better ones. 

In addition to these two principal procedures, four per cent of 
the papers were rated by deducting all the errors from the number 
of credits specified, which meant that the rating was more rigid 
than was intended. The remaining two per cent of the papers 
were rated by deducting all the errors from the number of items 
attempted, which was more lenient than was intended, even 
making possible more credits than were allowable. 


TaBLe III.—Ratine Procepures USED BY THE TEACHERS FOR 
THose Papers WHERE Extra Optional ITEMS 
WERE ANSWERED 





1 


Teachers’ rating 
procedure 


2 


10 A’s 


3 


30 B’s 


4 


88 C’s 


73 D’s 


36 E’s 


Total 





Counted only the cor- 
rect responses on the 
first 5 or 9 or 11 an- 
swers (according to 
credits specified in 
the directions). 


90% 


16 
53% 


48 
55% 


60% 


15 
42% 


132 
55% 





Gave full credit for 
every correct re- 
sponse up to the num- 
ber of credits allowed 


30% 


37 
42% 


33 % 


21 
58% 


91 
39% 





Combined totals of 
several other meth- 
ods employed 


17% 


3% 


7% 


14 
6% 





Totals: Number 
Percentage 


10 
100 


30 
100 


88 
100 











73 
100 


36 
100 








237 
100 








From the results, it appears that the following conclusions are 
justified, within the limitations imposed by the data and the 


methods employed: 




















cnaeraeemne 





enim tage taste neal 


Sy EET SS 





508 The Journal of Educational Psychology 


1) Unless pupils are specifically instructed in advance, it is 
unwise to use optional questions because a considerable propor- 
tion of them do not follow the detailed directions which are 
necessary. 

2) The better pupils follow the unusual directions more closely 
than do the poorer ones. 

3) Unless definite procedures are given to the teachers, optional 
questions are not rated uniformly, 

4) When the rating procedure is not rigidly predetermined, 
teachers seem to favor the poorer pupils in comparison with the 
better ones which causes the examination to be less discriminatory 
than was intended. 


BOOK REVIEWS 


MarK A. May. A Social Psychology of War and Peace. New 
Haven, Conn., Yale University Press, 1943, pp. 284. 


This relatively non-technical treatment of the psychological 
problems of war and peace begins with a discussion of these 
reasons or explanations for war which have been advanced at 
one time or another: 1) it is rooted in man’s biological nature; 
2) man (unconsciously?) wants, needs, and admires war; 3) war 
stimulates social variations and selects strong intelligent groups 
for survival; 4) war is a tonic making nations strong and keeping 
them growing; 5) armed conflict is a consequence of power 
politics; 6) war is a means for the perpetuation of groups in 
power; and, 7) war is an inevitable and final test of an issue 
between groups. Lest the reader interprets even this as an 
over-simplification the author hastens to add that wars’ ‘‘ causes 
are multiple and complex” (p. 19). Following this introduction 
are chapters on: Why War?, War, Peace and Social Learning, 
Learning To Hate and To Fight, Learning To Fear and To 
Escape, Learning To Love and To Defend, Learning To Follow 
Leaders, Aggressive Social Movements, Defensive Social Move- 
ments, and, The Present War and the Future Peace. 

May employs the ‘drive,’ ‘cue,’ ‘response,’ ‘reward’ analysis of 
learning first elaborated at length by Hull and later by Miller 
and Dollard, to explain the acquisition of those habits and 
attitudes and dispositions that result in war. Most of the argu- 
ment is based upon the fundamental thesis, completely convinc- 
ing to the reviewer, that men go to war because they have learned 
to believe that aggression brings rewards. With respect to the 
future May believes we will have peace in the degree that we 
learn loyalties and responsibilities to larger and larger groups. 
Furthermore, education in tolerance and good will toward 
‘different’ people can be carried farther and be more effective 
when these different groups have been brought together under 
one political system. 

In the concluding chapter, The Present War and the Future 
Peace, May explains our actions during 1940 and 1941 largely 
through his interpretations of the Gallup polls. He contends 
that various fears were competing for recognition in action; 
first, a fear of something like a repetition of World War I with 
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its disillusionment and lost peace, and, second, a fear of what 
might happen if the Allies lost the war this time. The latter 
fear took ascendency about May, 1941. Throughout this section 
the reviewer felt that the author departed at times from his 
scientific scrutiny of our national emotions and asserted as facts 
certain occurrences or interpretations that historians will proba- 
bly be quarrelling over for the next century. Some examples are: 
“England did need our aid,” (p. 250); ‘‘In his (Mr. Roosevelt’s) 
various talks to the nation he made clear where he stood and 
why,” (p. 240); “All anti-war sentiments disappeared imme- 
diately (after Pearl Harbor), (p. 254); ‘“‘The unity that now 
exists among the United Nations is based upon common pur- 
poses.” (p. 257.) 

While the point was not overlooked, insufficient emphasis was 
placed by May upon the distinction between what it is that men 
believe brings them reward and what actually does operate in 
their own interests. In the case of war it is not too difficult to 
demonstrate a persistent belief that armed combat has over-all 
good consequences at least as a last alternative. But this is 
subtly and profoundly different from resort to war because ‘‘the 
group members have learned that aggression pays.”’ (p.26.) In 
other words, too little was said about propaganda techniques 
and other forms of official government misrepresentation, (see 
the OWI’s Mr. John Durfee). Yet it is just such methods that 
lead men to ‘believe’ that fantastic rewards and punishments are 
associated with behavior that may be either peaceful or warlike. 
It is this disposition of modern governments, democratic or not, 
to decide in advance what we simple folks should believe, and 
then bend every effort to push us in that direction, that is more 
responsible for wars than even psychologists seem willing to 
grant. STerpHEN M. Corey 

University of Chicago. 


DoroTHY HurcHinson. In Quest of Foster Parents. New York: 
Columbia University Press, 1943, pp. 145. 


In this treatise, written for case workers and others interested 
in problems related to selection of foster parents, is discussed 
‘the psychology of home finding as it may affect both the worker 
and the foster parents.”” Adoptive, free, and boarding homes 
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are considered. In discussing the desire for parenthood, inter- 
viewing the foster mother, testimony on qualifications, rejection 
of applications for children, problems of wartime and evaluation 
of foster parenthood, the prospective foster parent is treated as a 
human being with individual desires, ambitions, needs and short- 
comings. The emphasis is upon a sympathetic understanding 
of the needs of both foster parents and foster children. Deci- 
sions, nevertheless, must be made without emotional bias and in 
terms of all the facts in any particular case. Psychological 
problems of the case worker are not neglected. 

This book is highly practical. Citation of case studies reveal 
actual problems and how they are met. It is obvious that the 
author’s evaluations are based upon a broad background of experi- 
ence in the field. Although this book will appeal especially to 
students and case workers, it may be read with profit by all those 
interested in child placement. Mies A. TINKER 

University of Minnesota 


JOSEPHINE H. MacLatcuy, Editor. Education on the Afr. 
Columbus, Ohio: Ohio State University Print Shop, 1942, 
pp. 310. 


Radio is, today, an important means of shaping national 
morale and of providing universal education. The ‘average 
man’ in the United States is probably inclined to overlook this 
fundamental importance of radio as long as he is entertained by 
the majority of programs he listens to. However, to those who 
stop to consider the opinion-forming power that modern radio 
has over the large audience it reaches, it is gratifying to learn 
that administrators in the broadcasting field meet regularly to 
discuss their obligations to a democratic public and to exchange 
opinions on ways of improving programs, not only in terms of 
their ‘audience appeal,’ a prime concern of commercial radio, 
but also in terms of augmenting the educational and cultural 
value of radio. 

Education on the Air is a record of the thirteenth annual meet- 
ing of the Institute for Education by Radio. The meeting was 
held five months after the United States had entered the war. 
Its purpose was to consider the contribution of the broadcasting 
industry to the war during these five months and to urge those 
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in the radio industry to concentrate their energy in developing 
programs similar to those whose value in producing a ‘Fighting 
America’ had already been proved. Panel discussions were 
held on the following problems of radio: newscasting, radio 
drama, children’s programs, radio in higher education, racial 
broadcasting, agricultural and homemaking broadcasting, and 
religious broadcasting. 

In general, this book records verbatim the discussions which 
took place in the panels. Since it presents much extemporaneous 
material from individual participants at each panel, the reading 
is not always too coherent. Therefore, conscientious readers 
must cover some rather poor material in order not to neglect 
significant material. Also, as is to be expected in any discussion 
involving a large group of participants, questions from the 
audience often divert a speaker from his planned line of argument 
and prevent him from bringing his contribution to a forceful 
conclusion. 

The book definitely awakens one from apathy towards radio 
as an educational tool. Forward looking instructors will do 
well to read Education on the Air for the many helpful and specific 
ideas it presents on the use of radio equipment and programs 
in the classroom. Davip V. TIEDEMAN 

University of Rochester 
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